From nobody Thu Oct 2 09:22:12 2025 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DA2CB287242 for ; Thu, 18 Sep 2025 10:19:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758190804; cv=none; b=B0mC8tLlrM0Nhaby/T2/7FXSOp3hxTVzJXbkeLRsyOWAtlGCwhpryacTD0xFIzYZvJp98QGz8oaTSaEPN27SyxGpmkjsI/z8OOClN1Tw9bDbQzjQp4s71GV5QdDq3TByVPJaZ1+d3myq+9SkzCavhOJm/VHKmEgIz+L4kZQyeBA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758190804; c=relaxed/simple; bh=1a8Ak2Bho3TKhnMtfNeqLS0EDyGBpfUliErFhGvn5Xw=; h=Date:Mime-Version:Message-ID:Subject:From:To:Cc:Content-Type; b=EPCdhklP0806MI4ci7PsMlShEeEPXr61dNFISkEXdQo1byZauqQ6Cc/9nGkSSwFzg4Z9hTQKpQhJp6eBzHeoVTMlQUyKiFSe/d7Fv6HkInNgjsPLr7eV4c+slsm57cNA2lAbbpucmxiZNNAjfLibUvnBq3BTIJBTB0rGY8+GOOw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--aliceryhl.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=XBDlhFCc; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--aliceryhl.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="XBDlhFCc" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-45de18e7eccso4421625e9.0 for ; Thu, 18 Sep 2025 03:19:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1758190788; x=1758795588; darn=vger.kernel.org; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=VAj2QvK8s2oXcw2dklTKJ8k8+leCpxXmmk/gUqRQs5s=; b=XBDlhFCcrMWyZWd7OIuQmcLSJeD5ttXc4/236MQ3zxRX9J04bna2yNAT8Qgw0SlIHC OeGued1uZGANEz0Z4iaD+2wVwFlDnQYfRW6nYmZOE2Yewn71CpoA9XOR68UNWV0151Zw esrcls/N3kVfKfG9iYf9TNDiSglgbufNoRjMuRzyFwtka5Q6H8Ez5gV2jouzEeI/TK7Y 4X9+vmw/JcPUT7LnuIZjmNBLea/+9Wn6eKoJ2qxZo9Bgn6ZY3YmU+lxUHeU8EE1fln9w 60NhtDqT1TuRbLndCKcbc3JpiqnE8esZDT3zCBj7DpiXTMhLAw64lV4z6gMOKRxDGm9A 9mHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758190788; x=1758795588; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=VAj2QvK8s2oXcw2dklTKJ8k8+leCpxXmmk/gUqRQs5s=; b=PYb1jaLfVpOCfonaisyFXxEyFNbIFk+b9GmGX1NiPBS5VKuPd6wYcyG49ajfceK0KR 0ZfybB3QjZ5x4K14dyJqOCxvn1ygGTeGUhoHxAFFT4P/qBIoDfIHGBtuuNtWw525gGCm TTzXtqUd1Z/ZfigI/c4YZuKtroNDfAZIdLBisqIW9Wp9uzoDuQFxYNXZHYElsWpTaL20 msy/phmzAVnsyzvNQP6bWNaWf/SMlQ3OlCzYcjn5LWO4SmdZ7EvAbw5JDacvF4+pu81u T7Hm5AxpsvfRd51tjbJh8S2xLzDCyWY5NL+ABJks49jCXWEAJb4jin1BxXAoWkbpSgjf g/0Q== X-Forwarded-Encrypted: i=1; AJvYcCVvJmSuYN+nfH93FLHfKAznSWajFasvqHyj8g32mhcGhKin0I/LHTuLPjxwM1NrjjeG/5l7Nx+px/Q1reM=@vger.kernel.org X-Gm-Message-State: AOJu0YxkQ9nh2ejZzx8YL7qucbGFZO+qiQg9wQSijFctyp2KFz3iFKMW 5otp8zY6ssUbZPx4QP+kMFPF2s32UYxrO4NccUaEG3MIuXv6RTjWnq32eM0p4q2dRPEGt6VD30k GTgCxH+50iUKE4U5XGg== X-Google-Smtp-Source: AGHT+IGfftsXiAMkTp4yWeJMr1C8iNFsw3b16k32heD/hQq/7uYaHFcOqsl/3E6Y2DwRBmzY/AXb/1korNzgW7U= X-Received: from wmrn43.prod.google.com ([2002:a05:600c:502b:b0:45d:eb3b:54a0]) (user=aliceryhl job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:1991:b0:45f:2869:c3b2 with SMTP id 5b1f17b1804b1-462072d723emr52484235e9.33.1758190788228; Thu, 18 Sep 2025 03:19:48 -0700 (PDT) Date: Thu, 18 Sep 2025 10:19:38 +0000 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-B4-Tracking: v=1; b=H4sIALncy2gC/6tWKk4tykwtVrJSqFYqSi3LLM7MzwNyDHUUlJIzE vPSU3UzU4B8JSMDI1MDS0ML3aLS4hLdpMy8lNQi3RQLoKhhYpKJeXKSElBHQVFqWmYF2LTo2Np aAItpn5ZdAAAA X-Change-Id: 20250918-rust-binder-d80251ab47cb X-Developer-Key: i=aliceryhl@google.com; a=openpgp; fpr=49F6C1FAA74960F43A5B86A1EE7A392FDE96209F X-Developer-Signature: v=1; a=openpgp-sha256; l=392021; i=aliceryhl@google.com; h=from:subject:message-id; bh=1a8Ak2Bho3TKhnMtfNeqLS0EDyGBpfUliErFhGvn5Xw=; b=owEBbQKS/ZANAwAKAQRYvu5YxjlGAcsmYgBoy9y8xU+W2tgN8ZZWxih0uF1ivFMIsd+YlqK0y klmea8nYB+JAjMEAAEKAB0WIQSDkqKUTWQHCvFIvbIEWL7uWMY5RgUCaMvcvAAKCRAEWL7uWMY5 RrBiD/0fXjssFHoGVwjwSfT9Z+LZkI8PAFggzNBSzHD7yU4mK2ZsNdmL4y8Z5d0B+3mmI+9pwRD ooUVECvMPt/E8omNiR0xB+zqmeMOm7TIhHfuipFrDxmQlAoAl8WafXr0+gZVFdxlIiJJjlSrIP4 OfxrXwy5Q5rTyrebDV0ioieiS2fJPobbNMY7hwTi5ZRRQA+QqzY/VW+/KiosuTvxGD2hpkETtbQ 3QgYLQEBIodaY9qy5crH2sOGaiJUZdxq1tVDOlXpmf84gAmwIDW/N2CK3KUF7ngTjez5CoUGGOK WML84x4R700ZFfuCupgcbsdtbdYmYErmWWGsSasx9Oaetr2LXD59cbChfn7/EZ7uVVEZhcDz3Lb I0l3FcUw4DT080/1MKgk1EDuAJIAfw4tkJDxAf76KtxN1oDCR/r0coL8X+k+rcbKP384emc5wHZ UV9W1B6Re2vVRHazCTwJ6cr1HOW6VmetaNoX3372OJ1ivrJxPiI7bopsVQrYX8H7WKhJUf4HOEu TvATllJLcMzMfTFKf568W3UsMGQPbLlmtp9VRNQ+TUtbQqiI5REPq+SM0Xh1i2GypUewDu28OA6 M8SEKhYTXI6QzmvQF6GauMw4UOE2ZVNbAtwb9M0IzHte9S8NAZ75rqeSnvDDtLkCdQ5cXlhPzfG 1Bh4bEA5/UEYvYg== X-Mailer: b4 0.14.2 Message-ID: <20250918-rust-binder-v1-1-7a5559e8c6bb@google.com> Subject: [PATCH] rust_binder: add Rust Binder driver From: Alice Ryhl To: Greg Kroah-Hartman Cc: "=?utf-8?q?Arve_Hj=C3=B8nnev=C3=A5g?=" , Todd Kjos , Martijn Coenen , Joel Fernandes , Christian Brauner , Carlos Llamas , Suren Baghdasaryan , Miguel Ojeda , Boqun Feng , Gary Guo , "=?utf-8?q?Bj=C3=B6rn_Roy_Baron?=" , Benno Lossin , Andreas Hindborg , Trevor Gross , Danilo Krummrich , Lorenzo Stoakes , "Liam R. Howlett" , Paul Moore , Serge Hallyn , linux-kernel@vger.kernel.org, rust-for-linux@vger.kernel.org, Wedson Almeida Filho , Matt Gilbride , Alice Ryhl Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Please see the attached link to the original RFC for motivation. I did not include all of the tracepoints as I felt that the mechansim for making C access fields of Rust structs should be discussed on list separately. I also did not include the support for building Rust Binder as a module since that requires exporting a bunch of additional symbols on the C side. Link: https://lore.kernel.org/r/20231101-rust-binder-v1-0-08ba9197f637@goog= le.com Co-developed-by: Wedson Almeida Filho Signed-off-by: Wedson Almeida Filho Co-developed-by: Matt Gilbride Signed-off-by: Matt Gilbride Signed-off-by: Alice Ryhl Acked-by: Carlos Llamas Acked-by: Paul Moore --- drivers/android/Kconfig | 15 +- drivers/android/Makefile | 1 + drivers/android/binder/Makefile | 9 + drivers/android/binder/allocation.rs | 602 +++++++++ drivers/android/binder/context.rs | 180 +++ drivers/android/binder/deferred_close.rs | 204 +++ drivers/android/binder/defs.rs | 182 +++ drivers/android/binder/dummy.c | 15 + drivers/android/binder/error.rs | 99 ++ drivers/android/binder/freeze.rs | 388 ++++++ drivers/android/binder/node.rs | 1131 +++++++++++++++++ drivers/android/binder/node/wrapper.rs | 78 ++ drivers/android/binder/page_range.rs | 746 +++++++++++ drivers/android/binder/page_range_helper.c | 24 + drivers/android/binder/page_range_helper.h | 15 + drivers/android/binder/process.rs | 1696 +++++++++++++++++++++= ++++ drivers/android/binder/range_alloc/array.rs | 251 ++++ drivers/android/binder/range_alloc/mod.rs | 329 +++++ drivers/android/binder/range_alloc/tree.rs | 488 +++++++ drivers/android/binder/rust_binder.h | 23 + drivers/android/binder/rust_binder_events.c | 59 + drivers/android/binder/rust_binder_events.h | 36 + drivers/android/binder/rust_binder_internal.h | 87 ++ drivers/android/binder/rust_binder_main.rs | 627 +++++++++ drivers/android/binder/rust_binderfs.c | 850 +++++++++++++ drivers/android/binder/stats.rs | 89 ++ drivers/android/binder/thread.rs | 1596 +++++++++++++++++++++= ++ drivers/android/binder/trace.rs | 16 + drivers/android/binder/transaction.rs | 456 +++++++ include/uapi/linux/android/binder.h | 2 +- rust/bindings/bindings_helper.h | 8 + rust/helpers/binder.c | 26 + rust/helpers/helpers.c | 1 + rust/helpers/page.c | 8 + rust/helpers/security.c | 24 + rust/kernel/cred.rs | 6 + rust/kernel/page.rs | 6 + rust/kernel/security.rs | 37 + rust/uapi/uapi_helper.h | 1 + 39 files changed, 10409 insertions(+), 2 deletions(-) diff --git a/drivers/android/Kconfig b/drivers/android/Kconfig index 75af3cf472c8f8a3b93698911c6115ff5692943a..e2e402c9d1759c81591473ad02a= b7ad011bc61d0 100644 --- a/drivers/android/Kconfig +++ b/drivers/android/Kconfig @@ -14,6 +14,19 @@ config ANDROID_BINDER_IPC Android process, using Binder to identify, invoke and pass arguments between said processes. =20 +config ANDROID_BINDER_IPC_RUST + bool "Rust version of Android Binder IPC Driver" + depends on RUST && MMU && !ANDROID_BINDER_IPC + help + This enables the Rust implementation of the Binder driver. + + Binder is used in Android for both communication between processes, + and remote method invocation. + + This means one Android process can call a method/routine in another + Android process, using Binder to identify, invoke and pass arguments + between said processes. + config ANDROID_BINDERFS bool "Android Binderfs filesystem" depends on ANDROID_BINDER_IPC @@ -28,7 +41,7 @@ config ANDROID_BINDERFS =20 config ANDROID_BINDER_DEVICES string "Android Binder devices" - depends on ANDROID_BINDER_IPC + depends on ANDROID_BINDER_IPC || ANDROID_BINDER_IPC_RUST default "binder,hwbinder,vndbinder" help Default value for the binder.devices parameter. diff --git a/drivers/android/Makefile b/drivers/android/Makefile index f422f91e026b2c421c6036f0d5c4286a9cebe8ee..e0c650d3898edeefd66c7d04c28= bd1d0f49a76c9 100644 --- a/drivers/android/Makefile +++ b/drivers/android/Makefile @@ -4,3 +4,4 @@ ccflags-y +=3D -I$(src) # needed for trace events obj-$(CONFIG_ANDROID_BINDERFS) +=3D binderfs.o obj-$(CONFIG_ANDROID_BINDER_IPC) +=3D binder.o binder_alloc.o binder_netli= nk.o obj-$(CONFIG_ANDROID_BINDER_ALLOC_KUNIT_TEST) +=3D tests/ +obj-$(CONFIG_ANDROID_BINDER_IPC_RUST) +=3D binder/ diff --git a/drivers/android/binder/Makefile b/drivers/android/binder/Makef= ile new file mode 100644 index 0000000000000000000000000000000000000000..b70f80894c74cc2212f720f5a8d= 874b7bf4778d5 --- /dev/null +++ b/drivers/android/binder/Makefile @@ -0,0 +1,9 @@ +# SPDX-License-Identifier: GPL-2.0-only +ccflags-y +=3D -I$(src) # needed for trace events + +obj-$(CONFIG_ANDROID_BINDER_IPC_RUST) +=3D rust_binder.o +rust_binder-$(CONFIG_ANDROID_BINDER_IPC_RUST) :=3D \ + rust_binder_main.o \ + rust_binderfs.o \ + rust_binder_events.o \ + page_range_helper.o diff --git a/drivers/android/binder/allocation.rs b/drivers/android/binder/= allocation.rs new file mode 100644 index 0000000000000000000000000000000000000000..7f65a9c3a0e58e07a7e6d4e7d7b= 185f73fb1aab8 --- /dev/null +++ b/drivers/android/binder/allocation.rs @@ -0,0 +1,602 @@ +// SPDX-License-Identifier: GPL-2.0 + +// Copyright (C) 2025 Google LLC. + +use core::mem::{size_of, size_of_val, MaybeUninit}; +use core::ops::Range; + +use kernel::{ + bindings, + fs::file::{File, FileDescriptorReservation}, + prelude::*, + sync::{aref::ARef, Arc}, + transmute::{AsBytes, FromBytes}, + uaccess::UserSliceReader, + uapi, +}; + +use crate::{ + deferred_close::DeferredFdCloser, + defs::*, + node::{Node, NodeRef}, + process::Process, + DArc, +}; + +#[derive(Default)] +pub(crate) struct AllocationInfo { + /// Range within the allocation where we can find the offsets to the o= bject descriptors. + pub(crate) offsets: Option>, + /// The target node of the transaction this allocation is associated t= o. + /// Not set for replies. + pub(crate) target_node: Option, + /// When this allocation is dropped, call `pending_oneway_finished` on= the node. + /// + /// This is used to serialize oneway transaction on the same node. Bin= der guarantees that + /// oneway transactions to the same node are delivered sequentially in= the order they are sent. + pub(crate) oneway_node: Option>, + /// Zero the data in the buffer on free. + pub(crate) clear_on_free: bool, + /// List of files embedded in this transaction. + file_list: FileList, +} + +/// Represents an allocation that the kernel is currently using. +/// +/// When allocations are idle, the range allocator holds the data related = to them. +/// +/// # Invariants +/// +/// This allocation corresponds to an allocation in the range allocator, s= o the relevant pages are +/// marked in use in the page range. +pub(crate) struct Allocation { + pub(crate) offset: usize, + size: usize, + pub(crate) ptr: usize, + pub(crate) process: Arc, + allocation_info: Option, + free_on_drop: bool, + pub(crate) oneway_spam_detected: bool, + #[allow(dead_code)] + pub(crate) debug_id: usize, +} + +impl Allocation { + pub(crate) fn new( + process: Arc, + debug_id: usize, + offset: usize, + size: usize, + ptr: usize, + oneway_spam_detected: bool, + ) -> Self { + Self { + process, + offset, + size, + ptr, + debug_id, + oneway_spam_detected, + allocation_info: None, + free_on_drop: true, + } + } + + fn size_check(&self, offset: usize, size: usize) -> Result { + let overflow_fail =3D offset.checked_add(size).is_none(); + let cmp_size_fail =3D offset.wrapping_add(size) > self.size; + if overflow_fail || cmp_size_fail { + return Err(EFAULT); + } + Ok(()) + } + + pub(crate) fn copy_into( + &self, + reader: &mut UserSliceReader, + offset: usize, + size: usize, + ) -> Result { + self.size_check(offset, size)?; + + // SAFETY: While this object exists, the range allocator will keep= the range allocated, and + // in turn, the pages will be marked as in use. + unsafe { + self.process + .pages + .copy_from_user_slice(reader, self.offset + offset, size) + } + } + + pub(crate) fn read(&self, offset: usize) -> Result { + self.size_check(offset, size_of::())?; + + // SAFETY: While this object exists, the range allocator will keep= the range allocated, and + // in turn, the pages will be marked as in use. + unsafe { self.process.pages.read(self.offset + offset) } + } + + pub(crate) fn write(&self, offset: usize, obj: &T) -> Resul= t { + self.size_check(offset, size_of_val::(obj))?; + + // SAFETY: While this object exists, the range allocator will keep= the range allocated, and + // in turn, the pages will be marked as in use. + unsafe { self.process.pages.write(self.offset + offset, obj) } + } + + pub(crate) fn fill_zero(&self) -> Result { + // SAFETY: While this object exists, the range allocator will keep= the range allocated, and + // in turn, the pages will be marked as in use. + unsafe { self.process.pages.fill_zero(self.offset, self.size) } + } + + pub(crate) fn keep_alive(mut self) { + self.process + .buffer_make_freeable(self.offset, self.allocation_info.take()= ); + self.free_on_drop =3D false; + } + + pub(crate) fn set_info(&mut self, info: AllocationInfo) { + self.allocation_info =3D Some(info); + } + + pub(crate) fn get_or_init_info(&mut self) -> &mut AllocationInfo { + self.allocation_info.get_or_insert_with(Default::default) + } + + pub(crate) fn set_info_offsets(&mut self, offsets: Range) { + self.get_or_init_info().offsets =3D Some(offsets); + } + + pub(crate) fn set_info_oneway_node(&mut self, oneway_node: DArc)= { + self.get_or_init_info().oneway_node =3D Some(oneway_node); + } + + pub(crate) fn set_info_clear_on_drop(&mut self) { + self.get_or_init_info().clear_on_free =3D true; + } + + pub(crate) fn set_info_target_node(&mut self, target_node: NodeRef) { + self.get_or_init_info().target_node =3D Some(target_node); + } + + /// Reserve enough space to push at least `num_fds` fds. + pub(crate) fn info_add_fd_reserve(&mut self, num_fds: usize) -> Result= { + self.get_or_init_info() + .file_list + .files_to_translate + .reserve(num_fds, GFP_KERNEL)?; + + Ok(()) + } + + pub(crate) fn info_add_fd( + &mut self, + file: ARef, + buffer_offset: usize, + close_on_free: bool, + ) -> Result { + self.get_or_init_info().file_list.files_to_translate.push( + FileEntry { + file, + buffer_offset, + close_on_free, + }, + GFP_KERNEL, + )?; + + Ok(()) + } + + pub(crate) fn set_info_close_on_free(&mut self, cof: FdsCloseOnFree) { + self.get_or_init_info().file_list.close_on_free =3D cof.0; + } + + pub(crate) fn translate_fds(&mut self) -> Result { + let file_list =3D match self.allocation_info.as_mut() { + Some(info) =3D> &mut info.file_list, + None =3D> return Ok(TranslatedFds::new()), + }; + + let files =3D core::mem::take(&mut file_list.files_to_translate); + + let num_close_on_free =3D files.iter().filter(|entry| entry.close_= on_free).count(); + let mut close_on_free =3D KVec::with_capacity(num_close_on_free, G= FP_KERNEL)?; + + let mut reservations =3D KVec::with_capacity(files.len(), GFP_KERN= EL)?; + for file_info in files { + let res =3D FileDescriptorReservation::get_unused_fd_flags(bin= dings::O_CLOEXEC)?; + let fd =3D res.reserved_fd(); + self.write::(file_info.buffer_offset, &fd)?; + + reservations.push( + Reservation { + res, + file: file_info.file, + }, + GFP_KERNEL, + )?; + if file_info.close_on_free { + close_on_free.push(fd, GFP_KERNEL)?; + } + } + + Ok(TranslatedFds { + reservations, + close_on_free: FdsCloseOnFree(close_on_free), + }) + } + + /// Should the looper return to userspace when freeing this allocation? + pub(crate) fn looper_need_return_on_free(&self) -> bool { + // Closing fds involves pushing task_work for execution when we re= turn to userspace. Hence, + // we should return to userspace asap if we are closing fds. + match self.allocation_info { + Some(ref info) =3D> !info.file_list.close_on_free.is_empty(), + None =3D> false, + } + } +} + +impl Drop for Allocation { + fn drop(&mut self) { + if !self.free_on_drop { + return; + } + + if let Some(mut info) =3D self.allocation_info.take() { + if let Some(oneway_node) =3D info.oneway_node.as_ref() { + oneway_node.pending_oneway_finished(); + } + + info.target_node =3D None; + + if let Some(offsets) =3D info.offsets.clone() { + let view =3D AllocationView::new(self, offsets.start); + for i in offsets.step_by(size_of::()) { + if view.cleanup_object(i).is_err() { + pr_warn!("Error cleaning up object at offset {}\n"= , i) + } + } + } + + for &fd in &info.file_list.close_on_free { + let closer =3D match DeferredFdCloser::new(GFP_KERNEL) { + Ok(closer) =3D> closer, + Err(kernel::alloc::AllocError) =3D> { + // Ignore allocation failures. + break; + } + }; + + // Here, we ignore errors. The operation can fail if the f= d is not valid, or if the + // method is called from a kthread. However, this is alway= s called from a syscall, + // so the latter case cannot happen, and we don't care abo= ut the first case. + let _ =3D closer.close_fd(fd); + } + + if info.clear_on_free { + if let Err(e) =3D self.fill_zero() { + pr_warn!("Failed to clear data on free: {:?}", e); + } + } + } + + self.process.buffer_raw_free(self.ptr); + } +} + +/// A wrapper around `Allocation` that is being created. +/// +/// If the allocation is destroyed while wrapped in this wrapper, then the= allocation will be +/// considered to be part of a failed transaction. Successful transactions= avoid that by calling +/// `success`, which skips the destructor. +#[repr(transparent)] +pub(crate) struct NewAllocation(pub(crate) Allocation); + +impl NewAllocation { + pub(crate) fn success(self) -> Allocation { + // This skips the destructor. + // + // SAFETY: This type is `#[repr(transparent)]`, so the layout matc= hes. + unsafe { core::mem::transmute(self) } + } +} + +impl core::ops::Deref for NewAllocation { + type Target =3D Allocation; + fn deref(&self) -> &Allocation { + &self.0 + } +} + +impl core::ops::DerefMut for NewAllocation { + fn deref_mut(&mut self) -> &mut Allocation { + &mut self.0 + } +} + +/// A view into the beginning of an allocation. +/// +/// All attempts to read or write outside of the view will fail. To intent= ionally access outside of +/// this view, use the `alloc` field of this struct directly. +pub(crate) struct AllocationView<'a> { + pub(crate) alloc: &'a mut Allocation, + limit: usize, +} + +impl<'a> AllocationView<'a> { + pub(crate) fn new(alloc: &'a mut Allocation, limit: usize) -> Self { + AllocationView { alloc, limit } + } + + pub(crate) fn read(&self, offset: usize) -> Result { + if offset.checked_add(size_of::()).ok_or(EINVAL)? > self.limit { + return Err(EINVAL); + } + self.alloc.read(offset) + } + + pub(crate) fn write(&self, offset: usize, obj: &T) -> Resu= lt { + if offset.checked_add(size_of::()).ok_or(EINVAL)? > self.limit { + return Err(EINVAL); + } + self.alloc.write(offset, obj) + } + + pub(crate) fn copy_into( + &self, + reader: &mut UserSliceReader, + offset: usize, + size: usize, + ) -> Result { + if offset.checked_add(size).ok_or(EINVAL)? > self.limit { + return Err(EINVAL); + } + self.alloc.copy_into(reader, offset, size) + } + + pub(crate) fn transfer_binder_object( + &self, + offset: usize, + obj: &uapi::flat_binder_object, + strong: bool, + node_ref: NodeRef, + ) -> Result { + let mut newobj =3D FlatBinderObject::default(); + let node =3D node_ref.node.clone(); + if Arc::ptr_eq(&node_ref.node.owner, &self.alloc.process) { + // The receiving process is the owner of the node, so send it = a binder object (instead + // of a handle). + let (ptr, cookie) =3D node.get_id(); + newobj.hdr.type_ =3D if strong { + BINDER_TYPE_BINDER + } else { + BINDER_TYPE_WEAK_BINDER + }; + newobj.flags =3D obj.flags; + newobj.__bindgen_anon_1.binder =3D ptr as _; + newobj.cookie =3D cookie as _; + self.write(offset, &newobj)?; + // Increment the user ref count on the node. It will be decrem= ented as part of the + // destruction of the buffer, when we see a binder or weak-bin= der object. + node.update_refcount(true, 1, strong); + } else { + // The receiving process is different from the owner, so we ne= ed to insert a handle to + // the binder object. + let handle =3D self + .alloc + .process + .as_arc_borrow() + .insert_or_update_handle(node_ref, false)?; + newobj.hdr.type_ =3D if strong { + BINDER_TYPE_HANDLE + } else { + BINDER_TYPE_WEAK_HANDLE + }; + newobj.flags =3D obj.flags; + newobj.__bindgen_anon_1.handle =3D handle; + if self.write(offset, &newobj).is_err() { + // Decrement ref count on the handle we just created. + let _ =3D self + .alloc + .process + .as_arc_borrow() + .update_ref(handle, false, strong); + return Err(EINVAL); + } + } + + Ok(()) + } + + fn cleanup_object(&self, index_offset: usize) -> Result { + let offset =3D self.alloc.read(index_offset)?; + let header =3D self.read::(offset)?; + match header.type_ { + BINDER_TYPE_WEAK_BINDER | BINDER_TYPE_BINDER =3D> { + let obj =3D self.read::(offset)?; + let strong =3D header.type_ =3D=3D BINDER_TYPE_BINDER; + // SAFETY: The type is `BINDER_TYPE_{WEAK_}BINDER`, so the= `binder` field is + // populated. + let ptr =3D unsafe { obj.__bindgen_anon_1.binder }; + let cookie =3D obj.cookie; + self.alloc.process.update_node(ptr, cookie, strong); + Ok(()) + } + BINDER_TYPE_WEAK_HANDLE | BINDER_TYPE_HANDLE =3D> { + let obj =3D self.read::(offset)?; + let strong =3D header.type_ =3D=3D BINDER_TYPE_HANDLE; + // SAFETY: The type is `BINDER_TYPE_{WEAK_}HANDLE`, so the= `handle` field is + // populated. + let handle =3D unsafe { obj.__bindgen_anon_1.handle }; + self.alloc + .process + .as_arc_borrow() + .update_ref(handle, false, strong) + } + _ =3D> Ok(()), + } + } +} + +/// A binder object as it is serialized. +/// +/// # Invariants +/// +/// All bytes must be initialized, and the value of `self.hdr.type_` must = be one of the allowed +/// types. +#[repr(C)] +pub(crate) union BinderObject { + hdr: uapi::binder_object_header, + fbo: uapi::flat_binder_object, + fdo: uapi::binder_fd_object, + bbo: uapi::binder_buffer_object, + fdao: uapi::binder_fd_array_object, +} + +/// A view into a `BinderObject` that can be used in a match statement. +pub(crate) enum BinderObjectRef<'a> { + Binder(&'a mut uapi::flat_binder_object), + Handle(&'a mut uapi::flat_binder_object), + Fd(&'a mut uapi::binder_fd_object), + Ptr(&'a mut uapi::binder_buffer_object), + Fda(&'a mut uapi::binder_fd_array_object), +} + +impl BinderObject { + pub(crate) fn read_from(reader: &mut UserSliceReader) -> Result { + let object =3D Self::read_from_inner(|slice| { + let read_len =3D usize::min(slice.len(), reader.len()); + reader.clone_reader().read_slice(&mut slice[..read_len])?; + Ok(()) + })?; + + // If we used a object type smaller than the largest object size, = then we've read more + // bytes than we needed to. However, we used `.clone_reader()` to = avoid advancing the + // original reader. Now, we call `skip` so that the caller's reade= r is advanced by the + // right amount. + // + // The `skip` call fails if the reader doesn't have `size` bytes a= vailable. This could + // happen if the type header corresponds to an object type that is= larger than the rest of + // the reader. + // + // Any extra bytes beyond the size of the object are inaccessible = after this call, so + // reading them again from the `reader` later does not result in T= OCTOU bugs. + reader.skip(object.size())?; + + Ok(object) + } + + /// Use the provided reader closure to construct a `BinderObject`. + /// + /// The closure should write the bytes for the object into the provide= d slice. + pub(crate) fn read_from_inner(reader: R) -> Result + where + R: FnOnce(&mut [u8; size_of::()]) -> Result<()>, + { + let mut obj =3D MaybeUninit::::zeroed(); + + // SAFETY: The lengths of `BinderObject` and `[u8; size_of::()]` are equal, + // and the byte array has an alignment requirement of one, so the = pointer cast is okay. + // Additionally, `obj` was initialized to zeros, so the byte array= will not be + // uninitialized. + (reader)(unsafe { &mut *obj.as_mut_ptr().cast() })?; + + // SAFETY: The entire object is initialized, so accessing this fie= ld is safe. + let type_ =3D unsafe { obj.assume_init_ref().hdr.type_ }; + if Self::type_to_size(type_).is_none() { + // The value of `obj.hdr_type_` was invalid. + return Err(EINVAL); + } + + // SAFETY: All bytes are initialized (since we zeroed them at the = start) and we checked + // that `self.hdr.type_` is one of the allowed types, so the type = invariants are satisfied. + unsafe { Ok(obj.assume_init()) } + } + + pub(crate) fn as_ref(&mut self) -> BinderObjectRef<'_> { + use BinderObjectRef::*; + // SAFETY: The constructor ensures that all bytes of `self` are in= itialized, and all + // variants of this union accept all initialized bit patterns. + unsafe { + match self.hdr.type_ { + BINDER_TYPE_WEAK_BINDER | BINDER_TYPE_BINDER =3D> Binder(&= mut self.fbo), + BINDER_TYPE_WEAK_HANDLE | BINDER_TYPE_HANDLE =3D> Handle(&= mut self.fbo), + BINDER_TYPE_FD =3D> Fd(&mut self.fdo), + BINDER_TYPE_PTR =3D> Ptr(&mut self.bbo), + BINDER_TYPE_FDA =3D> Fda(&mut self.fdao), + // SAFETY: By the type invariant, the value of `self.hdr.t= ype_` cannot have any + // other value than the ones checked above. + _ =3D> core::hint::unreachable_unchecked(), + } + } + } + + pub(crate) fn size(&self) -> usize { + // SAFETY: The entire object is initialized, so accessing this fie= ld is safe. + let type_ =3D unsafe { self.hdr.type_ }; + + // SAFETY: The type invariants guarantee that the type field is co= rrect. + unsafe { Self::type_to_size(type_).unwrap_unchecked() } + } + + fn type_to_size(type_: u32) -> Option { + match type_ { + BINDER_TYPE_WEAK_BINDER =3D> Some(size_of::()), + BINDER_TYPE_BINDER =3D> Some(size_of::()), + BINDER_TYPE_WEAK_HANDLE =3D> Some(size_of::()), + BINDER_TYPE_HANDLE =3D> Some(size_of::()), + BINDER_TYPE_FD =3D> Some(size_of::()), + BINDER_TYPE_PTR =3D> Some(size_of::()), + BINDER_TYPE_FDA =3D> Some(size_of::()), + _ =3D> None, + } + } +} + +#[derive(Default)] +struct FileList { + files_to_translate: KVec, + close_on_free: KVec, +} + +struct FileEntry { + /// The file for which a descriptor will be created in the recipient p= rocess. + file: ARef, + /// The offset in the buffer where the file descriptor is stored. + buffer_offset: usize, + /// Whether this fd should be closed when the allocation is freed. + close_on_free: bool, +} + +pub(crate) struct TranslatedFds { + reservations: KVec, + /// If commit is called, then these fds should be closed. (If commit i= s not called, then they + /// shouldn't be closed.) + close_on_free: FdsCloseOnFree, +} + +struct Reservation { + res: FileDescriptorReservation, + file: ARef, +} + +impl TranslatedFds { + pub(crate) fn new() -> Self { + Self { + reservations: KVec::new(), + close_on_free: FdsCloseOnFree(KVec::new()), + } + } + + pub(crate) fn commit(self) -> FdsCloseOnFree { + for entry in self.reservations { + entry.res.fd_install(entry.file); + } + + self.close_on_free + } +} + +pub(crate) struct FdsCloseOnFree(KVec); diff --git a/drivers/android/binder/context.rs b/drivers/android/binder/con= text.rs new file mode 100644 index 0000000000000000000000000000000000000000..3d135ec03ca74d7dd6f3d678073= 75eadd4f70fe8 --- /dev/null +++ b/drivers/android/binder/context.rs @@ -0,0 +1,180 @@ +// SPDX-License-Identifier: GPL-2.0 + +// Copyright (C) 2025 Google LLC. + +use kernel::{ + error::Error, + list::{List, ListArc, ListLinks}, + prelude::*, + security, + str::{CStr, CString}, + sync::{Arc, Mutex}, + task::Kuid, +}; + +use crate::{error::BinderError, node::NodeRef, process::Process}; + +kernel::sync::global_lock! { + // SAFETY: We call `init` in the module initializer, so it's initializ= ed before first use. + pub(crate) unsafe(uninit) static CONTEXTS: Mutex =3D Cont= extList { + list: List::new(), + }; +} + +pub(crate) struct ContextList { + list: List, +} + +pub(crate) fn get_all_contexts() -> Result>> { + let lock =3D CONTEXTS.lock(); + + let count =3D lock.list.iter().count(); + + let mut ctxs =3D KVec::with_capacity(count, GFP_KERNEL)?; + for ctx in &lock.list { + ctxs.push(Arc::from(ctx), GFP_KERNEL)?; + } + Ok(ctxs) +} + +/// This struct keeps track of the processes using this context, and which= process is the context +/// manager. +struct Manager { + node: Option, + uid: Option, + all_procs: List, +} + +/// There is one context per binder file (/dev/binder, /dev/hwbinder, etc) +#[pin_data] +pub(crate) struct Context { + #[pin] + manager: Mutex, + pub(crate) name: CString, + #[pin] + links: ListLinks, +} + +kernel::list::impl_list_arc_safe! { + impl ListArcSafe<0> for Context { untracked; } +} +kernel::list::impl_list_item! { + impl ListItem<0> for Context { + using ListLinks { self.links }; + } +} + +impl Context { + pub(crate) fn new(name: &CStr) -> Result> { + let name =3D CString::try_from(name)?; + let list_ctx =3D ListArc::pin_init::( + try_pin_init!(Context { + name, + links <- ListLinks::new(), + manager <- kernel::new_mutex!(Manager { + all_procs: List::new(), + node: None, + uid: None, + }, "Context::manager"), + }), + GFP_KERNEL, + )?; + + let ctx =3D list_ctx.clone_arc(); + CONTEXTS.lock().list.push_back(list_ctx); + + Ok(ctx) + } + + /// Called when the file for this context is unlinked. + /// + /// No-op if called twice. + pub(crate) fn deregister(&self) { + // SAFETY: We never add the context to any other linked list than = this one, so it is either + // in this list, or not in any list. + unsafe { CONTEXTS.lock().list.remove(self) }; + } + + pub(crate) fn register_process(self: &Arc, proc: ListArc) { + if !Arc::ptr_eq(self, &proc.ctx) { + pr_err!("Context::register_process called on the wrong context= ."); + return; + } + self.manager.lock().all_procs.push_back(proc); + } + + pub(crate) fn deregister_process(self: &Arc, proc: &Process) { + if !Arc::ptr_eq(self, &proc.ctx) { + pr_err!("Context::deregister_process called on the wrong conte= xt."); + return; + } + // SAFETY: We just checked that this is the right list. + unsafe { self.manager.lock().all_procs.remove(proc) }; + } + + pub(crate) fn set_manager_node(&self, node_ref: NodeRef) -> Result { + let mut manager =3D self.manager.lock(); + if manager.node.is_some() { + pr_warn!("BINDER_SET_CONTEXT_MGR already set"); + return Err(EBUSY); + } + security::binder_set_context_mgr(&node_ref.node.owner.cred)?; + + // If the context manager has been set before, ensure that we use = the same euid. + let caller_uid =3D Kuid::current_euid(); + if let Some(ref uid) =3D manager.uid { + if *uid !=3D caller_uid { + return Err(EPERM); + } + } + + manager.node =3D Some(node_ref); + manager.uid =3D Some(caller_uid); + Ok(()) + } + + pub(crate) fn unset_manager_node(&self) { + let node_ref =3D self.manager.lock().node.take(); + drop(node_ref); + } + + pub(crate) fn get_manager_node(&self, strong: bool) -> Result { + self.manager + .lock() + .node + .as_ref() + .ok_or_else(BinderError::new_dead)? + .clone(strong) + .map_err(BinderError::from) + } + + pub(crate) fn for_each_proc(&self, mut func: F) + where + F: FnMut(&Process), + { + let lock =3D self.manager.lock(); + for proc in &lock.all_procs { + func(&proc); + } + } + + pub(crate) fn get_all_procs(&self) -> Result>> { + let lock =3D self.manager.lock(); + let count =3D lock.all_procs.iter().count(); + + let mut procs =3D KVec::with_capacity(count, GFP_KERNEL)?; + for proc in &lock.all_procs { + procs.push(Arc::from(proc), GFP_KERNEL)?; + } + Ok(procs) + } + + pub(crate) fn get_procs_with_pid(&self, pid: i32) -> Result>> { + let orig =3D self.get_all_procs()?; + let mut backing =3D KVec::with_capacity(orig.len(), GFP_KERNEL)?; + for proc in orig.into_iter().filter(|proc| proc.task.pid() =3D=3D = pid) { + backing.push(proc, GFP_KERNEL)?; + } + Ok(backing) + } +} diff --git a/drivers/android/binder/deferred_close.rs b/drivers/android/bin= der/deferred_close.rs new file mode 100644 index 0000000000000000000000000000000000000000..ac895c04d0cb7e867a36ba583fa= cdecef10ca224 --- /dev/null +++ b/drivers/android/binder/deferred_close.rs @@ -0,0 +1,204 @@ +// SPDX-License-Identifier: GPL-2.0 + +// Copyright (C) 2025 Google LLC. + +//! Logic for closing files in a deferred manner. +//! +//! This file could make sense to have in `kernel::fs`, but it was rejecte= d for being too +//! Binder-specific. + +use core::mem::MaybeUninit; +use kernel::{ + alloc::{AllocError, Flags}, + bindings, + prelude::*, +}; + +/// Helper used for closing file descriptors in a way that is safe even if= the file is currently +/// held using `fdget`. +/// +/// Additional motivation can be found in commit 80cd795630d6 ("binder: fi= x use-after-free due to +/// ksys_close() during fdget()") and in the comments on `binder_do_fd_clo= se`. +pub(crate) struct DeferredFdCloser { + inner: KBox, +} + +/// SAFETY: This just holds an allocation with no real content, so there's= no safety issue with +/// moving it across threads. +unsafe impl Send for DeferredFdCloser {} +/// SAFETY: This just holds an allocation with no real content, so there's= no safety issue with +/// moving it across threads. +unsafe impl Sync for DeferredFdCloser {} + +/// # Invariants +/// +/// If the `file` pointer is non-null, then it points at a `struct file` a= nd owns a refcount to +/// that file. +#[repr(C)] +struct DeferredFdCloserInner { + twork: MaybeUninit, + file: *mut bindings::file, +} + +impl DeferredFdCloser { + /// Create a new [`DeferredFdCloser`]. + pub(crate) fn new(flags: Flags) -> Result { + Ok(Self { + // INVARIANT: The `file` pointer is null, so the type invarian= t does not apply. + inner: KBox::new( + DeferredFdCloserInner { + twork: MaybeUninit::uninit(), + file: core::ptr::null_mut(), + }, + flags, + )?, + }) + } + + /// Schedule a task work that closes the file descriptor when this tas= k returns to userspace. + /// + /// Fails if this is called from a context where we cannot run work wh= en returning to + /// userspace. (E.g., from a kthread.) + pub(crate) fn close_fd(self, fd: u32) -> Result<(), DeferredFdCloseErr= or> { + use bindings::task_work_notify_mode_TWA_RESUME as TWA_RESUME; + + // In this method, we schedule the task work before closing the fi= le. This is because + // scheduling a task work is fallible, and we need to know whether= it will fail before we + // attempt to close the file. + + // Task works are not available on kthreads. + let current =3D kernel::current!(); + + // Check if this is a kthread. + // SAFETY: Reading `flags` from a task is always okay. + if unsafe { ((*current.as_ptr()).flags & bindings::PF_KTHREAD) != =3D 0 } { + return Err(DeferredFdCloseError::TaskWorkUnavailable); + } + + // Transfer ownership of the box's allocation to a raw pointer. Th= is disables the + // destructor, so we must manually convert it back to a KBox to dr= op it. + // + // Until we convert it back to a `KBox`, there are no aliasing req= uirements on this + // pointer. + let inner =3D KBox::into_raw(self.inner); + + // The `callback_head` field is first in the struct, so this cast = correctly gives us a + // pointer to the field. + let callback_head =3D inner.cast::(); + // SAFETY: This pointer offset operation does not go out-of-bounds. + let file_field =3D unsafe { core::ptr::addr_of_mut!((*inner).file)= }; + + let current =3D current.as_ptr(); + + // SAFETY: This function currently has exclusive access to the `De= ferredFdCloserInner`, so + // it is okay for us to perform unsynchronized writes to its `call= back_head` field. + unsafe { bindings::init_task_work(callback_head, Some(Self::do_clo= se_fd)) }; + + // SAFETY: This inserts the `DeferredFdCloserInner` into the task = workqueue for the current + // task. If this operation is successful, then this transfers excl= usive ownership of the + // `callback_head` field to the C side until it calls `do_close_fd= `, and we don't touch or + // invalidate the field during that time. + // + // When the C side calls `do_close_fd`, the safety requirements of= that method are + // satisfied because when a task work is executed, the callback is= given ownership of the + // pointer. + // + // The file pointer is currently null. If it is changed to be non-= null before `do_close_fd` + // is called, then that change happens due to the write at the end= of this function, and + // that write has a safety comment that explains why the refcount = can be dropped when + // `do_close_fd` runs. + let res =3D unsafe { bindings::task_work_add(current, callback_hea= d, TWA_RESUME) }; + + if res !=3D 0 { + // SAFETY: Scheduling the task work failed, so we still have o= wnership of the box, so + // we may destroy it. + unsafe { drop(KBox::from_raw(inner)) }; + + return Err(DeferredFdCloseError::TaskWorkUnavailable); + } + + // This removes the fd from the fd table in `current`. The file is= not fully closed until + // `filp_close` is called. We are given ownership of one refcount = to the file. + // + // SAFETY: This is safe no matter what `fd` is. If the `fd` is val= id (that is, if the + // pointer is non-null), then we call `filp_close` on the returned= pointer as required by + // `file_close_fd`. + let file =3D unsafe { bindings::file_close_fd(fd) }; + if file.is_null() { + // We don't clean up the task work since that might be expensi= ve if the task work queue + // is long. Just let it execute and let it clean up for itself. + return Err(DeferredFdCloseError::BadFd); + } + + // Acquire a second refcount to the file. + // + // SAFETY: The `file` pointer points at a file with a non-zero ref= count. + unsafe { bindings::get_file(file) }; + + // This method closes the fd, consuming one of our two refcounts. = There could be active + // light refcounts created from that fd, so we must ensure that th= e file has a positive + // refcount for the duration of those active light refcounts. We d= o that by holding on to + // the second refcount until the current task returns to userspace. + // + // SAFETY: The `file` pointer is valid. Passing `current->files` a= s the file table to close + // it in is correct, since we just got the `fd` from `file_close_f= d` which also uses + // `current->files`. + // + // Note: fl_owner_t is currently a void pointer. + unsafe { bindings::filp_close(file, (*current).files as bindings::= fl_owner_t) }; + + // We update the file pointer that the task work is supposed to fp= ut. This transfers + // ownership of our last refcount. + // + // INVARIANT: This changes the `file` field of a `DeferredFdCloser= Inner` from null to + // non-null. This doesn't break the type invariant for `DeferredFd= CloserInner` because we + // still own a refcount to the file, so we can pass ownership of t= hat refcount to the + // `DeferredFdCloserInner`. + // + // When `do_close_fd` runs, it must be safe for it to `fput` the r= efcount. However, this is + // the case because all light refcounts that are associated with t= he fd we closed + // previously must be dropped when `do_close_fd`, since light refc= ounts must be dropped + // before returning to userspace. + // + // SAFETY: Task works are executed on the current thread right bef= ore we return to + // userspace, so this write is guaranteed to happen before `do_clo= se_fd` is called, which + // means that a race is not possible here. + unsafe { *file_field =3D file }; + + Ok(()) + } + + /// # Safety + /// + /// The provided pointer must point at the `twork` field of a `Deferre= dFdCloserInner` stored in + /// a `KBox`, and the caller must pass exclusive ownership of that `KB= ox`. Furthermore, if the + /// file pointer is non-null, then it must be okay to release the refc= ount by calling `fput`. + unsafe extern "C" fn do_close_fd(inner: *mut bindings::callback_head) { + // SAFETY: The caller just passed us ownership of this box. + let inner =3D unsafe { KBox::from_raw(inner.cast::()) }; + if !inner.file.is_null() { + // SAFETY: By the type invariants, we own a refcount to this f= ile, and the caller + // guarantees that dropping the refcount now is okay. + unsafe { bindings::fput(inner.file) }; + } + // The allocation is freed when `inner` goes out of scope. + } +} + +/// Represents a failure to close an fd in a deferred manner. +#[derive(Copy, Clone, Debug, Eq, PartialEq)] +pub(crate) enum DeferredFdCloseError { + /// Closing the fd failed because we were unable to schedule a task wo= rk. + TaskWorkUnavailable, + /// Closing the fd failed because the fd does not exist. + BadFd, +} + +impl From for Error { + fn from(err: DeferredFdCloseError) -> Error { + match err { + DeferredFdCloseError::TaskWorkUnavailable =3D> ESRCH, + DeferredFdCloseError::BadFd =3D> EBADF, + } + } +} diff --git a/drivers/android/binder/defs.rs b/drivers/android/binder/defs.rs new file mode 100644 index 0000000000000000000000000000000000000000..33f51b4139c7e03184369dc7cd3= fc8b464dee012 --- /dev/null +++ b/drivers/android/binder/defs.rs @@ -0,0 +1,182 @@ +// SPDX-License-Identifier: GPL-2.0 + +// Copyright (C) 2025 Google LLC. + +use core::mem::MaybeUninit; +use core::ops::{Deref, DerefMut}; +use kernel::{ + transmute::{AsBytes, FromBytes}, + uapi::{self, *}, +}; + +macro_rules! pub_no_prefix { + ($prefix:ident, $($newname:ident),+ $(,)?) =3D> { + $(pub(crate) const $newname: u32 =3D kernel::macros::concat_idents= !($prefix, $newname);)+ + }; +} + +pub_no_prefix!( + binder_driver_return_protocol_, + BR_TRANSACTION, + BR_TRANSACTION_SEC_CTX, + BR_REPLY, + BR_DEAD_REPLY, + BR_FAILED_REPLY, + BR_FROZEN_REPLY, + BR_NOOP, + BR_SPAWN_LOOPER, + BR_TRANSACTION_COMPLETE, + BR_TRANSACTION_PENDING_FROZEN, + BR_ONEWAY_SPAM_SUSPECT, + BR_OK, + BR_ERROR, + BR_INCREFS, + BR_ACQUIRE, + BR_RELEASE, + BR_DECREFS, + BR_DEAD_BINDER, + BR_CLEAR_DEATH_NOTIFICATION_DONE, + BR_FROZEN_BINDER, + BR_CLEAR_FREEZE_NOTIFICATION_DONE, +); + +pub_no_prefix!( + binder_driver_command_protocol_, + BC_TRANSACTION, + BC_TRANSACTION_SG, + BC_REPLY, + BC_REPLY_SG, + BC_FREE_BUFFER, + BC_ENTER_LOOPER, + BC_EXIT_LOOPER, + BC_REGISTER_LOOPER, + BC_INCREFS, + BC_ACQUIRE, + BC_RELEASE, + BC_DECREFS, + BC_INCREFS_DONE, + BC_ACQUIRE_DONE, + BC_REQUEST_DEATH_NOTIFICATION, + BC_CLEAR_DEATH_NOTIFICATION, + BC_DEAD_BINDER_DONE, + BC_REQUEST_FREEZE_NOTIFICATION, + BC_CLEAR_FREEZE_NOTIFICATION, + BC_FREEZE_NOTIFICATION_DONE, +); + +pub_no_prefix!( + flat_binder_object_flags_, + FLAT_BINDER_FLAG_ACCEPTS_FDS, + FLAT_BINDER_FLAG_TXN_SECURITY_CTX +); + +pub_no_prefix!( + transaction_flags_, + TF_ONE_WAY, + TF_ACCEPT_FDS, + TF_CLEAR_BUF, + TF_UPDATE_TXN +); + +pub(crate) use uapi::{ + BINDER_TYPE_BINDER, BINDER_TYPE_FD, BINDER_TYPE_FDA, BINDER_TYPE_HANDL= E, BINDER_TYPE_PTR, + BINDER_TYPE_WEAK_BINDER, BINDER_TYPE_WEAK_HANDLE, +}; + +macro_rules! decl_wrapper { + ($newname:ident, $wrapped:ty) =3D> { + // Define a wrapper around the C type. Use `MaybeUninit` to enforc= e that the value of + // padding bytes must be preserved. + #[derive(Copy, Clone)] + #[repr(transparent)] + pub(crate) struct $newname(MaybeUninit<$wrapped>); + + // SAFETY: This macro is only used with types where this is ok. + unsafe impl FromBytes for $newname {} + // SAFETY: This macro is only used with types where this is ok. + unsafe impl AsBytes for $newname {} + + impl Deref for $newname { + type Target =3D $wrapped; + fn deref(&self) -> &Self::Target { + // SAFETY: We use `MaybeUninit` only to preserve padding. = The value must still + // always be valid. + unsafe { self.0.assume_init_ref() } + } + } + + impl DerefMut for $newname { + fn deref_mut(&mut self) -> &mut Self::Target { + // SAFETY: We use `MaybeUninit` only to preserve padding. = The value must still + // always be valid. + unsafe { self.0.assume_init_mut() } + } + } + + impl Default for $newname { + fn default() -> Self { + // Create a new value of this type where all bytes (includ= ing padding) are zeroed. + Self(MaybeUninit::zeroed()) + } + } + }; +} + +decl_wrapper!(BinderNodeDebugInfo, uapi::binder_node_debug_info); +decl_wrapper!(BinderNodeInfoForRef, uapi::binder_node_info_for_ref); +decl_wrapper!(FlatBinderObject, uapi::flat_binder_object); +decl_wrapper!(BinderFdObject, uapi::binder_fd_object); +decl_wrapper!(BinderFdArrayObject, uapi::binder_fd_array_object); +decl_wrapper!(BinderObjectHeader, uapi::binder_object_header); +decl_wrapper!(BinderBufferObject, uapi::binder_buffer_object); +decl_wrapper!(BinderTransactionData, uapi::binder_transaction_data); +decl_wrapper!( + BinderTransactionDataSecctx, + uapi::binder_transaction_data_secctx +); +decl_wrapper!(BinderTransactionDataSg, uapi::binder_transaction_data_sg); +decl_wrapper!(BinderWriteRead, uapi::binder_write_read); +decl_wrapper!(BinderVersion, uapi::binder_version); +decl_wrapper!(BinderFrozenStatusInfo, uapi::binder_frozen_status_info); +decl_wrapper!(BinderFreezeInfo, uapi::binder_freeze_info); +decl_wrapper!(BinderFrozenStateInfo, uapi::binder_frozen_state_info); +decl_wrapper!(BinderHandleCookie, uapi::binder_handle_cookie); +decl_wrapper!(ExtendedError, uapi::binder_extended_error); + +impl BinderVersion { + pub(crate) fn current() -> Self { + Self(MaybeUninit::new(uapi::binder_version { + protocol_version: BINDER_CURRENT_PROTOCOL_VERSION as _, + })) + } +} + +impl BinderTransactionData { + pub(crate) fn with_buffers_size(self, buffers_size: u64) -> BinderTran= sactionDataSg { + BinderTransactionDataSg(MaybeUninit::new(uapi::binder_transaction_= data_sg { + transaction_data: *self, + buffers_size, + })) + } +} + +impl BinderTransactionDataSecctx { + /// View the inner data as wrapped in `BinderTransactionData`. + pub(crate) fn tr_data(&mut self) -> &mut BinderTransactionData { + // SAFETY: Transparent wrapper is safe to transmute. + unsafe { + &mut *(&mut self.transaction_data as *mut uapi::binder_transac= tion_data + as *mut BinderTransactionData) + } + } +} + +impl ExtendedError { + pub(crate) fn new(id: u32, command: u32, param: i32) -> Self { + Self(MaybeUninit::new(uapi::binder_extended_error { + id, + command, + param, + })) + } +} diff --git a/drivers/android/binder/dummy.c b/drivers/android/binder/dummy.c new file mode 100644 index 0000000000000000000000000000000000000000..7e9f6ea3a474b59f11e723a709c= 0c21e8b8beae0 --- /dev/null +++ b/drivers/android/binder/dummy.c @@ -0,0 +1,15 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include + +static int __init rbd_init(void) +{ + pr_warn("Using Rust Binder dummy module"); + return 0; +} + +module_init(rbd_init); +MODULE_DESCRIPTION("Dummy Rust Binder module"); +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Alice Ryhl "); diff --git a/drivers/android/binder/error.rs b/drivers/android/binder/error= .rs new file mode 100644 index 0000000000000000000000000000000000000000..9921827267d0d679f5aebb586ec= f190efe7c6405 --- /dev/null +++ b/drivers/android/binder/error.rs @@ -0,0 +1,99 @@ +// SPDX-License-Identifier: GPL-2.0 + +// Copyright (C) 2025 Google LLC. + +use kernel::prelude::*; + +use crate::defs::*; + +pub(crate) type BinderResult =3D core::result::Result; + +/// An error that will be returned to userspace via the `BINDER_WRITE_READ= ` ioctl rather than via +/// errno. +pub(crate) struct BinderError { + pub(crate) reply: u32, + source: Option, +} + +impl BinderError { + pub(crate) fn new_dead() -> Self { + Self { + reply: BR_DEAD_REPLY, + source: None, + } + } + + pub(crate) fn new_frozen() -> Self { + Self { + reply: BR_FROZEN_REPLY, + source: None, + } + } + + pub(crate) fn new_frozen_oneway() -> Self { + Self { + reply: BR_TRANSACTION_PENDING_FROZEN, + source: None, + } + } + + pub(crate) fn is_dead(&self) -> bool { + self.reply =3D=3D BR_DEAD_REPLY + } + + pub(crate) fn as_errno(&self) -> kernel::ffi::c_int { + self.source.unwrap_or(EINVAL).to_errno() + } + + pub(crate) fn should_pr_warn(&self) -> bool { + self.source.is_some() + } +} + +/// Convert an errno into a `BinderError` and store the errno used to cons= truct it. The errno +/// should be stored as the thread's extended error when given to userspac= e. +impl From for BinderError { + fn from(source: Error) -> Self { + Self { + reply: BR_FAILED_REPLY, + source: Some(source), + } + } +} + +impl From for BinderError { + fn from(source: kernel::fs::file::BadFdError) -> Self { + BinderError::from(Error::from(source)) + } +} + +impl From for BinderError { + fn from(_: kernel::alloc::AllocError) -> Self { + Self { + reply: BR_FAILED_REPLY, + source: Some(ENOMEM), + } + } +} + +impl core::fmt::Debug for BinderError { + fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result { + match self.reply { + BR_FAILED_REPLY =3D> match self.source.as_ref() { + Some(source) =3D> f + .debug_struct("BR_FAILED_REPLY") + .field("source", source) + .finish(), + None =3D> f.pad("BR_FAILED_REPLY"), + }, + BR_DEAD_REPLY =3D> f.pad("BR_DEAD_REPLY"), + BR_FROZEN_REPLY =3D> f.pad("BR_FROZEN_REPLY"), + BR_TRANSACTION_PENDING_FROZEN =3D> f.pad("BR_TRANSACTION_PENDI= NG_FROZEN"), + BR_TRANSACTION_COMPLETE =3D> f.pad("BR_TRANSACTION_COMPLETE"), + _ =3D> f + .debug_struct("BinderError") + .field("reply", &self.reply) + .finish(), + } + } +} diff --git a/drivers/android/binder/freeze.rs b/drivers/android/binder/free= ze.rs new file mode 100644 index 0000000000000000000000000000000000000000..e68c3c8bc55a203c32261c23915= d8c427569e3b0 --- /dev/null +++ b/drivers/android/binder/freeze.rs @@ -0,0 +1,388 @@ +// SPDX-License-Identifier: GPL-2.0 + +// Copyright (C) 2025 Google LLC. + +use kernel::{ + alloc::AllocError, + list::ListArc, + prelude::*, + rbtree::{self, RBTreeNodeReservation}, + seq_file::SeqFile, + seq_print, + sync::{Arc, UniqueArc}, + uaccess::UserSliceReader, +}; + +use crate::{ + defs::*, node::Node, process::Process, thread::Thread, BinderReturnWri= ter, DArc, DLArc, + DTRWrap, DeliverToRead, +}; + +#[derive(Clone, Copy, Eq, PartialEq, Ord, PartialOrd)] +pub(crate) struct FreezeCookie(u64); + +/// Represents a listener for changes to the frozen state of a process. +pub(crate) struct FreezeListener { + /// The node we are listening for. + pub(crate) node: DArc, + /// The cookie of this freeze listener. + cookie: FreezeCookie, + /// What value of `is_frozen` did we most recently tell userspace abou= t? + last_is_frozen: Option, + /// We sent a `BR_FROZEN_BINDER` and we are waiting for `BC_FREEZE_NOT= IFICATION_DONE` before + /// sending any other commands. + is_pending: bool, + /// Userspace sent `BC_CLEAR_FREEZE_NOTIFICATION` and we need to reply= with + /// `BR_CLEAR_FREEZE_NOTIFICATION_DONE` as soon as possible. If `is_pe= nding` is set, then we + /// must wait for it to be unset before we can reply. + is_clearing: bool, + /// Number of cleared duplicates that can't be deleted until userspace= sends + /// `BC_FREEZE_NOTIFICATION_DONE`. + num_pending_duplicates: u64, + /// Number of cleared duplicates that can be deleted. + num_cleared_duplicates: u64, +} + +impl FreezeListener { + /// Is it okay to create a new listener with the same cookie as this o= ne for the provided node? + /// + /// Under some scenarios, userspace may delete a freeze listener and i= mmediately recreate it + /// with the same cookie. This results in duplicate listeners. To avoi= d issues with ambiguity, + /// we allow this only if the new listener is for the same node, and w= e also require that the + /// old listener has already been cleared. + fn allow_duplicate(&self, node: &DArc) -> bool { + Arc::ptr_eq(&self.node, node) && self.is_clearing + } +} + +type UninitFM =3D UniqueArc>= >; + +/// Represents a notification that the freeze state has changed. +pub(crate) struct FreezeMessage { + cookie: FreezeCookie, +} + +kernel::list::impl_list_arc_safe! { + impl ListArcSafe<0> for FreezeMessage { + untracked; + } +} + +impl FreezeMessage { + fn new(flags: kernel::alloc::Flags) -> Result { + UniqueArc::new_uninit(flags) + } + + fn init(ua: UninitFM, cookie: FreezeCookie) -> DLArc { + match ua.pin_init_with(DTRWrap::new(FreezeMessage { cookie })) { + Ok(msg) =3D> ListArc::from(msg), + Err(err) =3D> match err {}, + } + } +} + +impl DeliverToRead for FreezeMessage { + fn do_work( + self: DArc, + thread: &Thread, + writer: &mut BinderReturnWriter<'_>, + ) -> Result { + let _removed_listener; + let mut node_refs =3D thread.process.node_refs.lock(); + let Some(mut freeze_entry) =3D node_refs.freeze_listeners.find_mut= (&self.cookie) else { + return Ok(true); + }; + let freeze =3D freeze_entry.get_mut(); + + if freeze.num_cleared_duplicates > 0 { + freeze.num_cleared_duplicates -=3D 1; + drop(node_refs); + writer.write_code(BR_CLEAR_FREEZE_NOTIFICATION_DONE)?; + writer.write_payload(&self.cookie.0)?; + return Ok(true); + } + + if freeze.is_pending { + return Ok(true); + } + if freeze.is_clearing { + _removed_listener =3D freeze_entry.remove_node(); + drop(node_refs); + writer.write_code(BR_CLEAR_FREEZE_NOTIFICATION_DONE)?; + writer.write_payload(&self.cookie.0)?; + Ok(true) + } else { + let is_frozen =3D freeze.node.owner.inner.lock().is_frozen; + if freeze.last_is_frozen =3D=3D Some(is_frozen) { + return Ok(true); + } + + let mut state_info =3D BinderFrozenStateInfo::default(); + state_info.is_frozen =3D is_frozen as u32; + state_info.cookie =3D freeze.cookie.0; + freeze.is_pending =3D true; + freeze.last_is_frozen =3D Some(is_frozen); + drop(node_refs); + + writer.write_code(BR_FROZEN_BINDER)?; + writer.write_payload(&state_info)?; + // BR_FROZEN_BINDER notifications can cause transactions + Ok(false) + } + } + + fn cancel(self: DArc) {} + + fn should_sync_wakeup(&self) -> bool { + false + } + + #[inline(never)] + fn debug_print(&self, m: &SeqFile, prefix: &str, _tprefix: &str) -> Re= sult<()> { + seq_print!(m, "{}has frozen binder\n", prefix); + Ok(()) + } +} + +impl FreezeListener { + pub(crate) fn on_process_exit(&self, proc: &Arc) { + if !self.is_clearing { + self.node.remove_freeze_listener(proc); + } + } +} + +impl Process { + pub(crate) fn request_freeze_notif( + self: &Arc, + reader: &mut UserSliceReader, + ) -> Result<()> { + let hc =3D reader.read::()?; + let handle =3D hc.handle; + let cookie =3D FreezeCookie(hc.cookie); + + let msg =3D FreezeMessage::new(GFP_KERNEL)?; + let alloc =3D RBTreeNodeReservation::new(GFP_KERNEL)?; + + let mut node_refs_guard =3D self.node_refs.lock(); + let node_refs =3D &mut *node_refs_guard; + let Some(info) =3D node_refs.by_handle.get_mut(&handle) else { + pr_warn!("BC_REQUEST_FREEZE_NOTIFICATION invalid ref {}\n", ha= ndle); + return Err(EINVAL); + }; + if info.freeze().is_some() { + pr_warn!("BC_REQUEST_FREEZE_NOTIFICATION already set\n"); + return Err(EINVAL); + } + let node_ref =3D info.node_ref(); + let freeze_entry =3D node_refs.freeze_listeners.entry(cookie); + + if let rbtree::Entry::Occupied(ref dupe) =3D freeze_entry { + if !dupe.get().allow_duplicate(&node_ref.node) { + pr_warn!("BC_REQUEST_FREEZE_NOTIFICATION duplicate cookie\= n"); + return Err(EINVAL); + } + } + + // All failure paths must come before this call, and all modificat= ions must come after this + // call. + node_ref.node.add_freeze_listener(self, GFP_KERNEL)?; + + match freeze_entry { + rbtree::Entry::Vacant(entry) =3D> { + entry.insert( + FreezeListener { + cookie, + node: node_ref.node.clone(), + last_is_frozen: None, + is_pending: false, + is_clearing: false, + num_pending_duplicates: 0, + num_cleared_duplicates: 0, + }, + alloc, + ); + } + rbtree::Entry::Occupied(mut dupe) =3D> { + let dupe =3D dupe.get_mut(); + if dupe.is_pending { + dupe.num_pending_duplicates +=3D 1; + } else { + dupe.num_cleared_duplicates +=3D 1; + } + dupe.last_is_frozen =3D None; + dupe.is_pending =3D false; + dupe.is_clearing =3D false; + } + } + + *info.freeze() =3D Some(cookie); + let msg =3D FreezeMessage::init(msg, cookie); + drop(node_refs_guard); + let _ =3D self.push_work(msg); + Ok(()) + } + + pub(crate) fn freeze_notif_done(self: &Arc, reader: &mut UserSli= ceReader) -> Result<()> { + let cookie =3D FreezeCookie(reader.read()?); + let alloc =3D FreezeMessage::new(GFP_KERNEL)?; + let mut node_refs_guard =3D self.node_refs.lock(); + let node_refs =3D &mut *node_refs_guard; + let Some(freeze) =3D node_refs.freeze_listeners.get_mut(&cookie) e= lse { + pr_warn!("BC_FREEZE_NOTIFICATION_DONE {:016x} not found\n", co= okie.0); + return Err(EINVAL); + }; + let mut clear_msg =3D None; + if freeze.num_pending_duplicates > 0 { + clear_msg =3D Some(FreezeMessage::init(alloc, cookie)); + freeze.num_pending_duplicates -=3D 1; + freeze.num_cleared_duplicates +=3D 1; + } else { + if !freeze.is_pending { + pr_warn!( + "BC_FREEZE_NOTIFICATION_DONE {:016x} not pending\n", + cookie.0 + ); + return Err(EINVAL); + } + if freeze.is_clearing { + // Immediately send another FreezeMessage for BR_CLEAR_FRE= EZE_NOTIFICATION_DONE. + clear_msg =3D Some(FreezeMessage::init(alloc, cookie)); + } + freeze.is_pending =3D false; + } + drop(node_refs_guard); + if let Some(clear_msg) =3D clear_msg { + let _ =3D self.push_work(clear_msg); + } + Ok(()) + } + + pub(crate) fn clear_freeze_notif(self: &Arc, reader: &mut UserSl= iceReader) -> Result<()> { + let hc =3D reader.read::()?; + let handle =3D hc.handle; + let cookie =3D FreezeCookie(hc.cookie); + + let alloc =3D FreezeMessage::new(GFP_KERNEL)?; + let mut node_refs_guard =3D self.node_refs.lock(); + let node_refs =3D &mut *node_refs_guard; + let Some(info) =3D node_refs.by_handle.get_mut(&handle) else { + pr_warn!("BC_CLEAR_FREEZE_NOTIFICATION invalid ref {}\n", hand= le); + return Err(EINVAL); + }; + let Some(info_cookie) =3D info.freeze() else { + pr_warn!("BC_CLEAR_FREEZE_NOTIFICATION freeze notification not= active\n"); + return Err(EINVAL); + }; + if *info_cookie !=3D cookie { + pr_warn!("BC_CLEAR_FREEZE_NOTIFICATION freeze notification coo= kie mismatch\n"); + return Err(EINVAL); + } + let Some(listener) =3D node_refs.freeze_listeners.get_mut(&cookie)= else { + pr_warn!("BC_CLEAR_FREEZE_NOTIFICATION invalid cookie {}\n", h= andle); + return Err(EINVAL); + }; + listener.is_clearing =3D true; + listener.node.remove_freeze_listener(self); + *info.freeze() =3D None; + let mut msg =3D None; + if !listener.is_pending { + msg =3D Some(FreezeMessage::init(alloc, cookie)); + } + drop(node_refs_guard); + + if let Some(msg) =3D msg { + let _ =3D self.push_work(msg); + } + Ok(()) + } + + fn get_freeze_cookie(&self, node: &DArc) -> Option= { + let node_refs =3D &mut *self.node_refs.lock(); + let handle =3D node_refs.by_node.get(&node.global_id())?; + let node_ref =3D node_refs.by_handle.get_mut(handle)?; + *node_ref.freeze() + } + + /// Creates a vector of every freeze listener on this process. + /// + /// Returns pairs of the remote process listening for notifications an= d the local node it is + /// listening on. + #[expect(clippy::type_complexity)] + fn find_freeze_recipients(&self) -> Result, Arc)>, AllocError> { + // Defined before `inner` to drop after releasing spinlock if `pus= h_within_capacity` fails. + let mut node_proc_pair; + + // We pre-allocate space for up to 8 recipients before we take the= spinlock. However, if + // the allocation fails, use a vector with a capacity of zero inst= ead of failing. After + // all, there might not be any freeze listeners, in which case thi= s operation could still + // succeed. + let mut recipients =3D + KVVec::with_capacity(8, GFP_KERNEL).unwrap_or_else(|_err| KVVe= c::new()); + + let mut inner =3D self.lock_with_nodes(); + let mut curr =3D inner.nodes.cursor_front(); + while let Some(cursor) =3D curr { + let (key, node) =3D cursor.current(); + let key =3D *key; + let list =3D node.freeze_list(&inner.inner); + let len =3D list.len(); + + if recipients.spare_capacity_mut().len() < len { + drop(inner); + recipients.reserve(len, GFP_KERNEL)?; + inner =3D self.lock_with_nodes(); + // Find the node we were looking at and try again. If the = set of nodes was changed, + // then just proceed to the next node. This is ok because = we don't guarantee the + // inclusion of nodes that are added or removed in paralle= l with this operation. + curr =3D inner.nodes.cursor_lower_bound(&key); + continue; + } + + for proc in list { + node_proc_pair =3D (node.clone(), proc.clone()); + recipients + .push_within_capacity(node_proc_pair) + .map_err(|_| { + pr_err!( + "push_within_capacity failed even though we ch= ecked the capacity\n" + ); + AllocError + })?; + } + + curr =3D cursor.move_next(); + } + Ok(recipients) + } + + /// Prepare allocations for sending freeze messages. + pub(crate) fn prepare_freeze_messages(&self) -> Result { + let recipients =3D self.find_freeze_recipients()?; + let mut batch =3D KVVec::with_capacity(recipients.len(), GFP_KERNE= L)?; + for (node, proc) in recipients { + let Some(cookie) =3D proc.get_freeze_cookie(&node) else { + // If the freeze listener was removed in the meantime, jus= t discard the + // notification. + continue; + }; + let msg_alloc =3D FreezeMessage::new(GFP_KERNEL)?; + let msg =3D FreezeMessage::init(msg_alloc, cookie); + batch.push((proc, msg), GFP_KERNEL)?; + } + + Ok(FreezeMessages { batch }) + } +} + +pub(crate) struct FreezeMessages { + batch: KVVec<(Arc, DLArc)>, +} + +impl FreezeMessages { + pub(crate) fn send_messages(self) { + for (proc, msg) in self.batch { + let _ =3D proc.push_work(msg); + } + } +} diff --git a/drivers/android/binder/node.rs b/drivers/android/binder/node.rs new file mode 100644 index 0000000000000000000000000000000000000000..ade895ef791ec5746f9f5c1bfc1= 5f47d59829455 --- /dev/null +++ b/drivers/android/binder/node.rs @@ -0,0 +1,1131 @@ +// SPDX-License-Identifier: GPL-2.0 + +// Copyright (C) 2025 Google LLC. + +use kernel::{ + list::{AtomicTracker, List, ListArc, ListLinks, TryNewListArc}, + prelude::*, + seq_file::SeqFile, + seq_print, + sync::lock::{spinlock::SpinLockBackend, Guard}, + sync::{Arc, LockedBy, SpinLock}, +}; + +use crate::{ + defs::*, + error::BinderError, + process::{NodeRefInfo, Process, ProcessInner}, + thread::Thread, + transaction::Transaction, + BinderReturnWriter, DArc, DLArc, DTRWrap, DeliverToRead, +}; + +use core::mem; + +mod wrapper; +pub(crate) use self::wrapper::CritIncrWrapper; + +#[derive(Debug)] +pub(crate) struct CouldNotDeliverCriticalIncrement; + +/// Keeps track of how this node is scheduled. +/// +/// There are two ways to schedule a node to a work list. Just schedule th= e node itself, or +/// allocate a wrapper that references the node and schedule the wrapper. = These wrappers exists to +/// make it possible to "move" a node from one list to another - when `do_= work` is called directly +/// on the `Node`, then it's a no-op if there's also a pending wrapper. +/// +/// Wrappers are generally only needed for zero-to-one refcount increments= , and there are two cases +/// of this: weak increments and strong increments. We call such increment= s "critical" because it +/// is critical that they are delivered to the thread doing the increment.= Some examples: +/// +/// * One thread makes a zero-to-one strong increment, and another thread = makes a zero-to-one weak +/// increment. Delivering the node to the thread doing the weak incremen= t is wrong, since the +/// thread doing the strong increment may have ended a long time ago whe= n the command is actually +/// processed by userspace. +/// +/// * We have a weak reference and are about to drop it on one thread. But= then another thread does +/// a zero-to-one strong increment. If the strong increment gets sent to= the thread that was +/// about to drop the weak reference, then the strong increment could be= processed after the +/// other thread has already exited, which would be too late. +/// +/// Note that trying to create a `ListArc` to the node can succeed even if= `has_normal_push` is +/// set. This is because another thread might just have popped the node fr= om a todo list, but not +/// yet called `do_work`. However, if `has_normal_push` is false, then cre= ating a `ListArc` should +/// always succeed. +/// +/// Like the other fields in `NodeInner`, the delivery state is protected = by the process lock. +struct DeliveryState { + /// Is the `Node` currently scheduled? + has_pushed_node: bool, + + /// Is a wrapper currently scheduled? + /// + /// The wrapper is used only for strong zero2one increments. + has_pushed_wrapper: bool, + + /// Is the currently scheduled `Node` scheduled due to a weak zero2one= increment? + /// + /// Weak zero2one operations are always scheduled using the `Node`. + has_weak_zero2one: bool, + + /// Is the currently scheduled wrapper/`Node` scheduled due to a stron= g zero2one increment? + /// + /// If `has_pushed_wrapper` is set, then the strong zero2one increment= was scheduled using the + /// wrapper. Otherwise, `has_pushed_node` must be set and it was sched= uled using the `Node`. + has_strong_zero2one: bool, +} + +impl DeliveryState { + fn should_normal_push(&self) -> bool { + !self.has_pushed_node && !self.has_pushed_wrapper + } + + fn did_normal_push(&mut self) { + assert!(self.should_normal_push()); + self.has_pushed_node =3D true; + } + + fn should_push_weak_zero2one(&self) -> bool { + !self.has_weak_zero2one && !self.has_strong_zero2one + } + + fn can_push_weak_zero2one_normally(&self) -> bool { + !self.has_pushed_node + } + + fn did_push_weak_zero2one(&mut self) { + assert!(self.should_push_weak_zero2one()); + assert!(self.can_push_weak_zero2one_normally()); + self.has_pushed_node =3D true; + self.has_weak_zero2one =3D true; + } + + fn should_push_strong_zero2one(&self) -> bool { + !self.has_strong_zero2one + } + + fn can_push_strong_zero2one_normally(&self) -> bool { + !self.has_pushed_node + } + + fn did_push_strong_zero2one(&mut self) { + assert!(self.should_push_strong_zero2one()); + assert!(self.can_push_strong_zero2one_normally()); + self.has_pushed_node =3D true; + self.has_strong_zero2one =3D true; + } + + fn did_push_strong_zero2one_wrapper(&mut self) { + assert!(self.should_push_strong_zero2one()); + assert!(!self.can_push_strong_zero2one_normally()); + self.has_pushed_wrapper =3D true; + self.has_strong_zero2one =3D true; + } +} + +struct CountState { + /// The reference count. + count: usize, + /// Whether the process that owns this node thinks that we hold a refc= ount on it. (Note that + /// even if count is greater than one, we only increment it once in th= e owning process.) + has_count: bool, +} + +impl CountState { + fn new() -> Self { + Self { + count: 0, + has_count: false, + } + } +} + +struct NodeInner { + /// Strong refcounts held on this node by `NodeRef` objects. + strong: CountState, + /// Weak refcounts held on this node by `NodeRef` objects. + weak: CountState, + delivery_state: DeliveryState, + /// The binder driver guarantees that oneway transactions sent to the = same node are serialized, + /// that is, userspace will not be given the next one until it has fin= ished processing the + /// previous oneway transaction. This is done to avoid the case where = two oneway transactions + /// arrive in opposite order from the order in which they were sent. (= E.g., they could be + /// delivered to two different threads, which could appear as-if they = were sent in opposite + /// order.) + /// + /// To fix that, we store pending oneway transactions in a separate li= st in the node, and don't + /// deliver the next oneway transaction until userspace signals that i= t has finished processing + /// the previous oneway transaction by calling the `BC_FREE_BUFFER` io= ctl. + oneway_todo: List>, + /// Keeps track of whether this node has a pending oneway transaction. + /// + /// When this is true, incoming oneway transactions are stored in `one= way_todo`, instead of + /// being delivered directly to the process. + has_oneway_transaction: bool, + /// List of processes to deliver a notification to when this node is d= estroyed (usually due to + /// the process dying). + death_list: List, 1>, + /// List of processes to deliver freeze notifications to. + freeze_list: KVVec>, + /// The number of active BR_INCREFS or BR_ACQUIRE operations. (should = be maximum two) + /// + /// If this is non-zero, then we postpone any BR_RELEASE or BR_DECREFS= notifications until the + /// active operations have ended. This avoids the situation an increme= nt and decrement get + /// reordered from userspace's perspective. + active_inc_refs: u8, + /// List of `NodeRefInfo` objects that reference this node. + refs: List, +} + +#[pin_data] +pub(crate) struct Node { + pub(crate) debug_id: usize, + ptr: u64, + pub(crate) cookie: u64, + pub(crate) flags: u32, + pub(crate) owner: Arc, + inner: LockedBy, + #[pin] + links_track: AtomicTracker, +} + +kernel::list::impl_list_arc_safe! { + impl ListArcSafe<0> for Node { + tracked_by links_track: AtomicTracker; + } +} + +// Make `oneway_todo` work. +kernel::list::impl_list_item! { + impl ListItem<0> for DTRWrap { + using ListLinks { self.links.inner }; + } +} + +impl Node { + pub(crate) fn new( + ptr: u64, + cookie: u64, + flags: u32, + owner: Arc, + ) -> impl PinInit { + pin_init!(Self { + inner: LockedBy::new( + &owner.inner, + NodeInner { + strong: CountState::new(), + weak: CountState::new(), + delivery_state: DeliveryState { + has_pushed_node: false, + has_pushed_wrapper: false, + has_weak_zero2one: false, + has_strong_zero2one: false, + }, + death_list: List::new(), + oneway_todo: List::new(), + freeze_list: KVVec::new(), + has_oneway_transaction: false, + active_inc_refs: 0, + refs: List::new(), + }, + ), + debug_id: super::next_debug_id(), + ptr, + cookie, + flags, + owner, + links_track <- AtomicTracker::new(), + }) + } + + pub(crate) fn has_oneway_transaction(&self, owner_inner: &mut ProcessI= nner) -> bool { + let inner =3D self.inner.access_mut(owner_inner); + inner.has_oneway_transaction + } + + #[inline(never)] + pub(crate) fn full_debug_print( + &self, + m: &SeqFile, + owner_inner: &mut ProcessInner, + ) -> Result<()> { + let inner =3D self.inner.access_mut(owner_inner); + seq_print!( + m, + " node {}: u{:016x} c{:016x} hs {} hw {} cs {} cw {}", + self.debug_id, + self.ptr, + self.cookie, + inner.strong.has_count, + inner.weak.has_count, + inner.strong.count, + inner.weak.count, + ); + if !inner.refs.is_empty() { + seq_print!(m, " proc"); + for node_ref in &inner.refs { + seq_print!(m, " {}", node_ref.process.task.pid()); + } + } + seq_print!(m, "\n"); + for t in &inner.oneway_todo { + t.debug_print_inner(m, " pending async transaction "); + } + Ok(()) + } + + /// Insert the `NodeRef` into this `refs` list. + /// + /// # Safety + /// + /// It must be the case that `info.node_ref.node` is this node. + pub(crate) unsafe fn insert_node_info( + &self, + info: ListArc, + ) { + self.inner + .access_mut(&mut self.owner.inner.lock()) + .refs + .push_front(info); + } + + /// Insert the `NodeRef` into this `refs` list. + /// + /// # Safety + /// + /// It must be the case that `info.node_ref.node` is this node. + pub(crate) unsafe fn remove_node_info( + &self, + info: &NodeRefInfo, + ) -> Option> { + // SAFETY: We always insert `NodeRefInfo` objects into the `refs` = list of the node that it + // references in `info.node_ref.node`. That is this node, so `info= ` cannot possibly be in + // the `refs` list of another node. + unsafe { + self.inner + .access_mut(&mut self.owner.inner.lock()) + .refs + .remove(info) + } + } + + /// An id that is unique across all binder nodes on the system. Used a= s the key in the + /// `by_node` map. + pub(crate) fn global_id(&self) -> usize { + self as *const Node as usize + } + + pub(crate) fn get_id(&self) -> (u64, u64) { + (self.ptr, self.cookie) + } + + pub(crate) fn add_death( + &self, + death: ListArc, 1>, + guard: &mut Guard<'_, ProcessInner, SpinLockBackend>, + ) { + self.inner.access_mut(guard).death_list.push_back(death); + } + + pub(crate) fn inc_ref_done_locked( + self: &DArc, + _strong: bool, + owner_inner: &mut ProcessInner, + ) -> Option> { + let inner =3D self.inner.access_mut(owner_inner); + if inner.active_inc_refs =3D=3D 0 { + pr_err!("inc_ref_done called when no active inc_refs"); + return None; + } + + inner.active_inc_refs -=3D 1; + if inner.active_inc_refs =3D=3D 0 { + // Having active inc_refs can inhibit dropping of ref-counts. = Calculate whether we + // would send a refcount decrement, and if so, tell the caller= to schedule us. + let strong =3D inner.strong.count > 0; + let has_strong =3D inner.strong.has_count; + let weak =3D strong || inner.weak.count > 0; + let has_weak =3D inner.weak.has_count; + + let should_drop_weak =3D !weak && has_weak; + let should_drop_strong =3D !strong && has_strong; + + // If we want to drop the ref-count again, tell the caller to = schedule a work node for + // that. + let need_push =3D should_drop_weak || should_drop_strong; + + if need_push && inner.delivery_state.should_normal_push() { + let list_arc =3D ListArc::try_from_arc(self.clone()).ok().= unwrap(); + inner.delivery_state.did_normal_push(); + Some(list_arc) + } else { + None + } + } else { + None + } + } + + pub(crate) fn update_refcount_locked( + self: &DArc, + inc: bool, + strong: bool, + count: usize, + owner_inner: &mut ProcessInner, + ) -> Option> { + let is_dead =3D owner_inner.is_dead; + let inner =3D self.inner.access_mut(owner_inner); + + // Get a reference to the state we'll update. + let state =3D if strong { + &mut inner.strong + } else { + &mut inner.weak + }; + + // Update the count and determine whether we need to push work. + let need_push =3D if inc { + state.count +=3D count; + // TODO: This method shouldn't be used for zero-to-one increme= nts. + !is_dead && !state.has_count + } else { + if state.count < count { + pr_err!("Failure: refcount underflow!"); + return None; + } + state.count -=3D count; + !is_dead && state.count =3D=3D 0 && state.has_count + }; + + if need_push && inner.delivery_state.should_normal_push() { + let list_arc =3D ListArc::try_from_arc(self.clone()).ok().unwr= ap(); + inner.delivery_state.did_normal_push(); + Some(list_arc) + } else { + None + } + } + + pub(crate) fn incr_refcount_allow_zero2one( + self: &DArc, + strong: bool, + owner_inner: &mut ProcessInner, + ) -> Result>, CouldNotDeliverCriticalIncrement> { + let is_dead =3D owner_inner.is_dead; + let inner =3D self.inner.access_mut(owner_inner); + + // Get a reference to the state we'll update. + let state =3D if strong { + &mut inner.strong + } else { + &mut inner.weak + }; + + // Update the count and determine whether we need to push work. + state.count +=3D 1; + if is_dead || state.has_count { + return Ok(None); + } + + // Userspace needs to be notified of this. + if !strong && inner.delivery_state.should_push_weak_zero2one() { + assert!(inner.delivery_state.can_push_weak_zero2one_normally()= ); + let list_arc =3D ListArc::try_from_arc(self.clone()).ok().unwr= ap(); + inner.delivery_state.did_push_weak_zero2one(); + Ok(Some(list_arc)) + } else if strong && inner.delivery_state.should_push_strong_zero2o= ne() { + if inner.delivery_state.can_push_strong_zero2one_normally() { + let list_arc =3D ListArc::try_from_arc(self.clone()).ok().= unwrap(); + inner.delivery_state.did_push_strong_zero2one(); + Ok(Some(list_arc)) + } else { + state.count -=3D 1; + Err(CouldNotDeliverCriticalIncrement) + } + } else { + // Work is already pushed, and we don't need to push again. + Ok(None) + } + } + + pub(crate) fn incr_refcount_allow_zero2one_with_wrapper( + self: &DArc, + strong: bool, + wrapper: CritIncrWrapper, + owner_inner: &mut ProcessInner, + ) -> Option> { + match self.incr_refcount_allow_zero2one(strong, owner_inner) { + Ok(Some(node)) =3D> Some(node as _), + Ok(None) =3D> None, + Err(CouldNotDeliverCriticalIncrement) =3D> { + assert!(strong); + let inner =3D self.inner.access_mut(owner_inner); + inner.strong.count +=3D 1; + inner.delivery_state.did_push_strong_zero2one_wrapper(); + Some(wrapper.init(self.clone())) + } + } + } + + pub(crate) fn update_refcount(self: &DArc, inc: bool, count: usi= ze, strong: bool) { + self.owner + .inner + .lock() + .update_node_refcount(self, inc, strong, count, None); + } + + pub(crate) fn populate_counts( + &self, + out: &mut BinderNodeInfoForRef, + guard: &Guard<'_, ProcessInner, SpinLockBackend>, + ) { + let inner =3D self.inner.access(guard); + out.strong_count =3D inner.strong.count as _; + out.weak_count =3D inner.weak.count as _; + } + + pub(crate) fn populate_debug_info( + &self, + out: &mut BinderNodeDebugInfo, + guard: &Guard<'_, ProcessInner, SpinLockBackend>, + ) { + out.ptr =3D self.ptr as _; + out.cookie =3D self.cookie as _; + let inner =3D self.inner.access(guard); + if inner.strong.has_count { + out.has_strong_ref =3D 1; + } + if inner.weak.has_count { + out.has_weak_ref =3D 1; + } + } + + pub(crate) fn force_has_count(&self, guard: &mut Guard<'_, ProcessInne= r, SpinLockBackend>) { + let inner =3D self.inner.access_mut(guard); + inner.strong.has_count =3D true; + inner.weak.has_count =3D true; + } + + fn write(&self, writer: &mut BinderReturnWriter<'_>, code: u32) -> Res= ult { + writer.write_code(code)?; + writer.write_payload(&self.ptr)?; + writer.write_payload(&self.cookie)?; + Ok(()) + } + + pub(crate) fn submit_oneway( + &self, + transaction: DLArc, + guard: &mut Guard<'_, ProcessInner, SpinLockBackend>, + ) -> Result<(), (BinderError, DLArc)> { + if guard.is_dead { + return Err((BinderError::new_dead(), transaction)); + } + + let inner =3D self.inner.access_mut(guard); + if inner.has_oneway_transaction { + inner.oneway_todo.push_back(transaction); + } else { + inner.has_oneway_transaction =3D true; + guard.push_work(transaction)?; + } + Ok(()) + } + + pub(crate) fn release(&self) { + let mut guard =3D self.owner.inner.lock(); + while let Some(work) =3D self.inner.access_mut(&mut guard).oneway_= todo.pop_front() { + drop(guard); + work.into_arc().cancel(); + guard =3D self.owner.inner.lock(); + } + + let death_list =3D core::mem::take(&mut self.inner.access_mut(&mut= guard).death_list); + drop(guard); + for death in death_list { + death.into_arc().set_dead(); + } + } + + pub(crate) fn pending_oneway_finished(&self) { + let mut guard =3D self.owner.inner.lock(); + if guard.is_dead { + // Cleanup will happen in `Process::deferred_release`. + return; + } + + let inner =3D self.inner.access_mut(&mut guard); + + let transaction =3D inner.oneway_todo.pop_front(); + inner.has_oneway_transaction =3D transaction.is_some(); + if let Some(transaction) =3D transaction { + match guard.push_work(transaction) { + Ok(()) =3D> {} + Err((_err, work)) =3D> { + // Process is dead. + // This shouldn't happen due to the `is_dead` check, b= ut if it does, just drop + // the transaction and return. + drop(guard); + drop(work); + } + } + } + } + + /// Finds an outdated transaction that the given transaction can repla= ce. + /// + /// If one is found, it is removed from the list and returned. + pub(crate) fn take_outdated_transaction( + &self, + new: &Transaction, + guard: &mut Guard<'_, ProcessInner, SpinLockBackend>, + ) -> Option> { + let inner =3D self.inner.access_mut(guard); + let mut cursor =3D inner.oneway_todo.cursor_front(); + while let Some(next) =3D cursor.peek_next() { + if new.can_replace(&next) { + return Some(next.remove()); + } + cursor.move_next(); + } + None + } + + /// This is split into a separate function since it's called by both `= Node::do_work` and + /// `NodeWrapper::do_work`. + fn do_work_locked( + &self, + writer: &mut BinderReturnWriter<'_>, + mut guard: Guard<'_, ProcessInner, SpinLockBackend>, + ) -> Result { + let inner =3D self.inner.access_mut(&mut guard); + let strong =3D inner.strong.count > 0; + let has_strong =3D inner.strong.has_count; + let weak =3D strong || inner.weak.count > 0; + let has_weak =3D inner.weak.has_count; + + if weak && !has_weak { + inner.weak.has_count =3D true; + inner.active_inc_refs +=3D 1; + } + + if strong && !has_strong { + inner.strong.has_count =3D true; + inner.active_inc_refs +=3D 1; + } + + let no_active_inc_refs =3D inner.active_inc_refs =3D=3D 0; + let should_drop_weak =3D no_active_inc_refs && (!weak && has_weak); + let should_drop_strong =3D no_active_inc_refs && (!strong && has_s= trong); + if should_drop_weak { + inner.weak.has_count =3D false; + } + if should_drop_strong { + inner.strong.has_count =3D false; + } + if no_active_inc_refs && !weak { + // Remove the node if there are no references to it. + guard.remove_node(self.ptr); + } + drop(guard); + + if weak && !has_weak { + self.write(writer, BR_INCREFS)?; + } + if strong && !has_strong { + self.write(writer, BR_ACQUIRE)?; + } + if should_drop_strong { + self.write(writer, BR_RELEASE)?; + } + if should_drop_weak { + self.write(writer, BR_DECREFS)?; + } + + Ok(true) + } + + pub(crate) fn add_freeze_listener( + &self, + process: &Arc, + flags: kernel::alloc::Flags, + ) -> Result { + let mut vec_alloc =3D KVVec::>::new(); + loop { + let mut guard =3D self.owner.inner.lock(); + // Do not check for `guard.dead`. The `dead` flag that matters= here is the owner of the + // listener, no the target. + let inner =3D self.inner.access_mut(&mut guard); + let len =3D inner.freeze_list.len(); + if len >=3D inner.freeze_list.capacity() { + if len >=3D vec_alloc.capacity() { + drop(guard); + vec_alloc =3D KVVec::with_capacity((1 + len).next_powe= r_of_two(), flags)?; + continue; + } + mem::swap(&mut inner.freeze_list, &mut vec_alloc); + for elem in vec_alloc.drain_all() { + inner.freeze_list.push_within_capacity(elem)?; + } + } + inner.freeze_list.push_within_capacity(process.clone())?; + return Ok(()); + } + } + + pub(crate) fn remove_freeze_listener(&self, p: &Arc) { + let _unused_capacity; + let mut guard =3D self.owner.inner.lock(); + let inner =3D self.inner.access_mut(&mut guard); + let len =3D inner.freeze_list.len(); + inner.freeze_list.retain(|proc| !Arc::ptr_eq(proc, p)); + if len =3D=3D inner.freeze_list.len() { + pr_warn!( + "Could not remove freeze listener for {}\n", + p.pid_in_current_ns() + ); + } + if inner.freeze_list.is_empty() { + _unused_capacity =3D mem::replace(&mut inner.freeze_list, KVVe= c::new()); + } + } + + pub(crate) fn freeze_list<'a>(&'a self, guard: &'a ProcessInner) -> &'= a [Arc] { + &self.inner.access(guard).freeze_list + } +} + +impl DeliverToRead for Node { + fn do_work( + self: DArc, + _thread: &Thread, + writer: &mut BinderReturnWriter<'_>, + ) -> Result { + let mut owner_inner =3D self.owner.inner.lock(); + let inner =3D self.inner.access_mut(&mut owner_inner); + + assert!(inner.delivery_state.has_pushed_node); + if inner.delivery_state.has_pushed_wrapper { + // If the wrapper is scheduled, then we are either a normal pu= sh or weak zero2one + // increment, and the wrapper is a strong zero2one increment, = so the wrapper always + // takes precedence over us. + assert!(inner.delivery_state.has_strong_zero2one); + inner.delivery_state.has_pushed_node =3D false; + inner.delivery_state.has_weak_zero2one =3D false; + return Ok(true); + } + + inner.delivery_state.has_pushed_node =3D false; + inner.delivery_state.has_weak_zero2one =3D false; + inner.delivery_state.has_strong_zero2one =3D false; + + self.do_work_locked(writer, owner_inner) + } + + fn cancel(self: DArc) {} + + fn should_sync_wakeup(&self) -> bool { + false + } + + #[inline(never)] + fn debug_print(&self, m: &SeqFile, prefix: &str, _tprefix: &str) -> Re= sult<()> { + seq_print!( + m, + "{}node work {}: u{:016x} c{:016x}\n", + prefix, + self.debug_id, + self.ptr, + self.cookie, + ); + Ok(()) + } +} + +/// Represents something that holds one or more ref-counts to a `Node`. +/// +/// Whenever process A holds a refcount to a node owned by a different pro= cess B, then process A +/// will store a `NodeRef` that refers to the `Node` in process B. When pr= ocess A releases the +/// refcount, we destroy the NodeRef, which decrements the ref-count in pr= ocess A. +/// +/// This type is also used for some other cases. For example, a transactio= n allocation holds a +/// refcount on the target node, and this is implemented by storing a `Nod= eRef` in the allocation +/// so that the destructor of the allocation will drop a refcount of the `= Node`. +pub(crate) struct NodeRef { + pub(crate) node: DArc, + /// How many times does this NodeRef hold a refcount on the Node? + strong_node_count: usize, + weak_node_count: usize, + /// How many times does userspace hold a refcount on this NodeRef? + strong_count: usize, + weak_count: usize, +} + +impl NodeRef { + pub(crate) fn new(node: DArc, strong_count: usize, weak_count: u= size) -> Self { + Self { + node, + strong_node_count: strong_count, + weak_node_count: weak_count, + strong_count, + weak_count, + } + } + + pub(crate) fn absorb(&mut self, mut other: Self) { + assert!( + Arc::ptr_eq(&self.node, &other.node), + "absorb called with differing nodes" + ); + self.strong_node_count +=3D other.strong_node_count; + self.weak_node_count +=3D other.weak_node_count; + self.strong_count +=3D other.strong_count; + self.weak_count +=3D other.weak_count; + other.strong_count =3D 0; + other.weak_count =3D 0; + other.strong_node_count =3D 0; + other.weak_node_count =3D 0; + + if self.strong_node_count >=3D 2 || self.weak_node_count >=3D 2 { + let mut guard =3D self.node.owner.inner.lock(); + let inner =3D self.node.inner.access_mut(&mut guard); + + if self.strong_node_count >=3D 2 { + inner.strong.count -=3D self.strong_node_count - 1; + self.strong_node_count =3D 1; + assert_ne!(inner.strong.count, 0); + } + if self.weak_node_count >=3D 2 { + inner.weak.count -=3D self.weak_node_count - 1; + self.weak_node_count =3D 1; + assert_ne!(inner.weak.count, 0); + } + } + } + + pub(crate) fn get_count(&self) -> (usize, usize) { + (self.strong_count, self.weak_count) + } + + pub(crate) fn clone(&self, strong: bool) -> Result { + if strong && self.strong_count =3D=3D 0 { + return Err(EINVAL); + } + Ok(self + .node + .owner + .inner + .lock() + .new_node_ref(self.node.clone(), strong, None)) + } + + /// Updates (increments or decrements) the number of references held a= gainst the node. If the + /// count being updated transitions from 0 to 1 or from 1 to 0, the no= de is notified by having + /// its `update_refcount` function called. + /// + /// Returns whether `self` should be removed (when both counts are zer= o). + pub(crate) fn update(&mut self, inc: bool, strong: bool) -> bool { + if strong && self.strong_count =3D=3D 0 { + return false; + } + let (count, node_count, other_count) =3D if strong { + ( + &mut self.strong_count, + &mut self.strong_node_count, + self.weak_count, + ) + } else { + ( + &mut self.weak_count, + &mut self.weak_node_count, + self.strong_count, + ) + }; + if inc { + if *count =3D=3D 0 { + *node_count =3D 1; + self.node.update_refcount(true, 1, strong); + } + *count +=3D 1; + } else { + if *count =3D=3D 0 { + pr_warn!( + "pid {} performed invalid decrement on ref\n", + kernel::current!().pid() + ); + return false; + } + *count -=3D 1; + if *count =3D=3D 0 { + self.node.update_refcount(false, *node_count, strong); + *node_count =3D 0; + return other_count =3D=3D 0; + } + } + false + } +} + +impl Drop for NodeRef { + // This destructor is called conditionally from `Allocation::drop`. Th= at branch is often + // mispredicted. Inlining this method call reduces the cost of those b= ranch mispredictions. + #[inline(always)] + fn drop(&mut self) { + if self.strong_node_count > 0 { + self.node + .update_refcount(false, self.strong_node_count, true); + } + if self.weak_node_count > 0 { + self.node + .update_refcount(false, self.weak_node_count, false); + } + } +} + +struct NodeDeathInner { + dead: bool, + cleared: bool, + notification_done: bool, + /// Indicates whether the normal flow was interrupted by removing the = handle. In this case, we + /// need behave as if the death notification didn't exist (i.e., we do= n't deliver anything to + /// the user. + aborted: bool, +} + +/// Used to deliver notifications when a process dies. +/// +/// A process can request to be notified when a process dies using `BC_REQ= UEST_DEATH_NOTIFICATION`. +/// This will make the driver send a `BR_DEAD_BINDER` to userspace when th= e process dies (or +/// immediately if it is already dead). Userspace is supposed to respond w= ith `BC_DEAD_BINDER_DONE` +/// once it has processed the notification. +/// +/// Userspace can unregister from death notifications using the `BC_CLEAR_= DEATH_NOTIFICATION` +/// command. In this case, the kernel will respond with `BR_CLEAR_DEATH_NO= TIFICATION_DONE` once the +/// notification has been removed. Note that if the remote process dies be= fore the kernel has +/// responded with `BR_CLEAR_DEATH_NOTIFICATION_DONE`, then the kernel wil= l still send a +/// `BR_DEAD_BINDER`, which userspace must be able to process. In this cas= e, the kernel will wait +/// for the `BC_DEAD_BINDER_DONE` command before it sends `BR_CLEAR_DEATH_= NOTIFICATION_DONE`. +/// +/// Note that even if the kernel sends a `BR_DEAD_BINDER`, this does not r= emove the death +/// notification. Userspace must still remove it manually using `BC_CLEAR_= DEATH_NOTIFICATION`. +/// +/// If a process uses `BC_RELEASE` to destroy its last refcount on a node = that has an active death +/// registration, then the death registration is immediately deleted (we i= mplement this using the +/// `aborted` field). However, userspace is not supposed to delete a `Node= Ref` without first +/// deregistering death notifications, so this codepath is not executed un= der normal circumstances. +#[pin_data] +pub(crate) struct NodeDeath { + node: DArc, + process: Arc, + pub(crate) cookie: u64, + #[pin] + links_track: AtomicTracker<0>, + /// Used by the owner `Node` to store a list of registered death notif= ications. + /// + /// # Invariants + /// + /// Only ever used with the `death_list` list of `self.node`. + #[pin] + death_links: ListLinks<1>, + /// Used by the process to keep track of the death notifications for w= hich we have sent a + /// `BR_DEAD_BINDER` but not yet received a `BC_DEAD_BINDER_DONE`. + /// + /// # Invariants + /// + /// Only ever used with the `delivered_deaths` list of `self.process`. + #[pin] + delivered_links: ListLinks<2>, + #[pin] + delivered_links_track: AtomicTracker<2>, + #[pin] + inner: SpinLock, +} + +impl NodeDeath { + /// Constructs a new node death notification object. + pub(crate) fn new( + node: DArc, + process: Arc, + cookie: u64, + ) -> impl PinInit> { + DTRWrap::new(pin_init!( + Self { + node, + process, + cookie, + links_track <- AtomicTracker::new(), + death_links <- ListLinks::new(), + delivered_links <- ListLinks::new(), + delivered_links_track <- AtomicTracker::new(), + inner <- kernel::new_spinlock!(NodeDeathInner { + dead: false, + cleared: false, + notification_done: false, + aborted: false, + }, "NodeDeath::inner"), + } + )) + } + + /// Sets the cleared flag to `true`. + /// + /// It removes `self` from the node's death notification list if neede= d. + /// + /// Returns whether it needs to be queued. + pub(crate) fn set_cleared(self: &DArc, abort: bool) -> bool { + let (needs_removal, needs_queueing) =3D { + // Update state and determine if we need to queue a work item.= We only need to do it + // when the node is not dead or if the user already completed = the death notification. + let mut inner =3D self.inner.lock(); + if abort { + inner.aborted =3D true; + } + if inner.cleared { + // Already cleared. + return false; + } + inner.cleared =3D true; + (!inner.dead, !inner.dead || inner.notification_done) + }; + + // Remove death notification from node. + if needs_removal { + let mut owner_inner =3D self.node.owner.inner.lock(); + let node_inner =3D self.node.inner.access_mut(&mut owner_inner= ); + // SAFETY: A `NodeDeath` is never inserted into the death list= of any node other than + // its owner, so it is either in this death list or in no deat= h list. + unsafe { node_inner.death_list.remove(self) }; + } + needs_queueing + } + + /// Sets the 'notification done' flag to `true`. + pub(crate) fn set_notification_done(self: DArc, thread: &Thread)= { + let needs_queueing =3D { + let mut inner =3D self.inner.lock(); + inner.notification_done =3D true; + inner.cleared + }; + if needs_queueing { + if let Some(death) =3D ListArc::try_from_arc_or_drop(self) { + let _ =3D thread.push_work_if_looper(death); + } + } + } + + /// Sets the 'dead' flag to `true` and queues work item if needed. + pub(crate) fn set_dead(self: DArc) { + let needs_queueing =3D { + let mut inner =3D self.inner.lock(); + if inner.cleared { + false + } else { + inner.dead =3D true; + true + } + }; + if needs_queueing { + // Push the death notification to the target process. There is= nothing else to do if + // it's already dead. + if let Some(death) =3D ListArc::try_from_arc_or_drop(self) { + let process =3D death.process.clone(); + let _ =3D process.push_work(death); + } + } + } +} + +kernel::list::impl_list_arc_safe! { + impl ListArcSafe<0> for NodeDeath { + tracked_by links_track: AtomicTracker; + } +} + +kernel::list::impl_list_arc_safe! { + impl ListArcSafe<1> for DTRWrap { untracked; } +} +kernel::list::impl_list_item! { + impl ListItem<1> for DTRWrap { + using ListLinks { self.wrapped.death_links }; + } +} + +kernel::list::impl_list_arc_safe! { + impl ListArcSafe<2> for DTRWrap { + tracked_by wrapped: NodeDeath; + } +} +kernel::list::impl_list_arc_safe! { + impl ListArcSafe<2> for NodeDeath { + tracked_by delivered_links_track: AtomicTracker<2>; + } +} +kernel::list::impl_list_item! { + impl ListItem<2> for DTRWrap { + using ListLinks { self.wrapped.delivered_links }; + } +} + +impl DeliverToRead for NodeDeath { + fn do_work( + self: DArc, + _thread: &Thread, + writer: &mut BinderReturnWriter<'_>, + ) -> Result { + let done =3D { + let inner =3D self.inner.lock(); + if inner.aborted { + return Ok(true); + } + inner.cleared && (!inner.dead || inner.notification_done) + }; + + let cookie =3D self.cookie; + let cmd =3D if done { + BR_CLEAR_DEATH_NOTIFICATION_DONE + } else { + let process =3D self.process.clone(); + let mut process_inner =3D process.inner.lock(); + let inner =3D self.inner.lock(); + if inner.aborted { + return Ok(true); + } + // We're still holding the inner lock, so it cannot be aborted= while we insert it into + // the delivered list. + process_inner.death_delivered(self.clone()); + BR_DEAD_BINDER + }; + + writer.write_code(cmd)?; + writer.write_payload(&cookie)?; + // DEAD_BINDER notifications can cause transactions, so stop proce= ssing work items when we + // get to a death notification. + Ok(cmd !=3D BR_DEAD_BINDER) + } + + fn cancel(self: DArc) {} + + fn should_sync_wakeup(&self) -> bool { + false + } + + #[inline(never)] + fn debug_print(&self, m: &SeqFile, prefix: &str, _tprefix: &str) -> Re= sult<()> { + let inner =3D self.inner.lock(); + + let dead_binder =3D inner.dead && !inner.notification_done; + + if dead_binder { + if inner.cleared { + seq_print!(m, "{}has cleared dead binder\n", prefix); + } else { + seq_print!(m, "{}has dead binder\n", prefix); + } + } else { + seq_print!(m, "{}has cleared death notification\n", prefix); + } + + Ok(()) + } +} diff --git a/drivers/android/binder/node/wrapper.rs b/drivers/android/binde= r/node/wrapper.rs new file mode 100644 index 0000000000000000000000000000000000000000..43294c050502926633b9fec92e8= 2e34f39f74fdb --- /dev/null +++ b/drivers/android/binder/node/wrapper.rs @@ -0,0 +1,78 @@ +// SPDX-License-Identifier: GPL-2.0 + +// Copyright (C) 2025 Google LLC. + +use kernel::{list::ListArc, prelude::*, seq_file::SeqFile, seq_print, sync= ::UniqueArc}; + +use crate::{node::Node, thread::Thread, BinderReturnWriter, DArc, DLArc, D= TRWrap, DeliverToRead}; + +use core::mem::MaybeUninit; + +pub(crate) struct CritIncrWrapper { + inner: UniqueArc>>, +} + +impl CritIncrWrapper { + pub(crate) fn new() -> Result { + Ok(CritIncrWrapper { + inner: UniqueArc::new_uninit(GFP_KERNEL)?, + }) + } + + pub(super) fn init(self, node: DArc) -> DLArc= { + match self.inner.pin_init_with(DTRWrap::new(NodeWrapper { node }))= { + Ok(initialized) =3D> ListArc::from(initialized) as _, + Err(err) =3D> match err {}, + } + } +} + +struct NodeWrapper { + node: DArc, +} + +kernel::list::impl_list_arc_safe! { + impl ListArcSafe<0> for NodeWrapper { + untracked; + } +} + +impl DeliverToRead for NodeWrapper { + fn do_work( + self: DArc, + _thread: &Thread, + writer: &mut BinderReturnWriter<'_>, + ) -> Result { + let node =3D &self.node; + let mut owner_inner =3D node.owner.inner.lock(); + let inner =3D node.inner.access_mut(&mut owner_inner); + + let ds =3D &mut inner.delivery_state; + + assert!(ds.has_pushed_wrapper); + assert!(ds.has_strong_zero2one); + ds.has_pushed_wrapper =3D false; + ds.has_strong_zero2one =3D false; + + node.do_work_locked(writer, owner_inner) + } + + fn cancel(self: DArc) {} + + fn should_sync_wakeup(&self) -> bool { + false + } + + #[inline(never)] + fn debug_print(&self, m: &SeqFile, prefix: &str, _tprefix: &str) -> Re= sult<()> { + seq_print!( + m, + "{}node work {}: u{:016x} c{:016x}\n", + prefix, + self.node.debug_id, + self.node.ptr, + self.node.cookie, + ); + Ok(()) + } +} diff --git a/drivers/android/binder/page_range.rs b/drivers/android/binder/= page_range.rs new file mode 100644 index 0000000000000000000000000000000000000000..2ae17e6776c25f23e66a8334ca7= 3925641a4deeb --- /dev/null +++ b/drivers/android/binder/page_range.rs @@ -0,0 +1,746 @@ +// SPDX-License-Identifier: GPL-2.0 + +// Copyright (C) 2025 Google LLC. + +//! This module has utilities for managing a page range where unused pages= may be reclaimed by a +//! vma shrinker. + +// To avoid deadlocks, locks are taken in the order: +// +// 1. mmap lock +// 2. spinlock +// 3. lru spinlock +// +// The shrinker will use trylock methods because it locks them in a differ= ent order. + +use core::{ + alloc::Layout, + marker::PhantomPinned, + mem::{size_of, size_of_val, MaybeUninit}, + ptr::{self, NonNull}, +}; + +use kernel::{ + alloc::allocator::KVmalloc, + alloc::Allocator, + bindings, + error::Result, + ffi::{c_ulong, c_void}, + mm::{virt, Mm, MmWithUser}, + new_mutex, new_spinlock, + page::{Page, PAGE_SHIFT, PAGE_SIZE}, + prelude::*, + str::CStr, + sync::{aref::ARef, Mutex, SpinLock}, + task::Pid, + transmute::FromBytes, + types::Opaque, + uaccess::UserSliceReader, +}; + +/// Represents a shrinker that can be registered with the kernel. +/// +/// Each shrinker can be used by many `ShrinkablePageRange` objects. +#[repr(C)] +pub(crate) struct Shrinker { + inner: Opaque<*mut bindings::shrinker>, + list_lru: Opaque, +} + +// SAFETY: The shrinker and list_lru are thread safe. +unsafe impl Send for Shrinker {} +// SAFETY: The shrinker and list_lru are thread safe. +unsafe impl Sync for Shrinker {} + +impl Shrinker { + /// Create a new shrinker. + /// + /// # Safety + /// + /// Before using this shrinker with a `ShrinkablePageRange`, the `regi= ster` method must have + /// been called exactly once, and it must not have returned an error. + pub(crate) const unsafe fn new() -> Self { + Self { + inner: Opaque::uninit(), + list_lru: Opaque::uninit(), + } + } + + /// Register this shrinker with the kernel. + pub(crate) fn register(&'static self, name: &CStr) -> Result<()> { + // SAFETY: These fields are not yet used, so it's okay to zero the= m. + unsafe { + self.inner.get().write(ptr::null_mut()); + self.list_lru.get().write_bytes(0, 1); + } + + // SAFETY: The field is not yet used, so we can initialize it. + let ret =3D unsafe { bindings::__list_lru_init(self.list_lru.get()= , false, ptr::null_mut()) }; + if ret !=3D 0 { + return Err(Error::from_errno(ret)); + } + + // SAFETY: The `name` points at a valid c string. + let shrinker =3D unsafe { bindings::shrinker_alloc(0, name.as_char= _ptr()) }; + if shrinker.is_null() { + // SAFETY: We initialized it, so its okay to destroy it. + unsafe { bindings::list_lru_destroy(self.list_lru.get()) }; + return Err(Error::from_errno(ret)); + } + + // SAFETY: We're about to register the shrinker, and these are the= fields we need to + // initialize. (All other fields are already zeroed.) + unsafe { + ptr::addr_of_mut!((*shrinker).count_objects).write(Some(rust_s= hrink_count)); + ptr::addr_of_mut!((*shrinker).scan_objects).write(Some(rust_sh= rink_scan)); + ptr::addr_of_mut!((*shrinker).private_data).write(self.list_lr= u.get().cast()); + } + + // SAFETY: The new shrinker has been fully initialized, so we can = register it. + unsafe { bindings::shrinker_register(shrinker) }; + + // SAFETY: This initializes the pointer to the shrinker so that we= can use it. + unsafe { self.inner.get().write(shrinker) }; + + Ok(()) + } +} + +/// A container that manages a page range in a vma. +/// +/// The pages can be thought of as an array of booleans of whether the pag= es are usable. The +/// methods `use_range` and `stop_using_range` set all booleans in a range= to true or false +/// respectively. Initially, no pages are allocated. When a page is not us= ed, it is not freed +/// immediately. Instead, it is made available to the memory shrinker to f= ree it if the device is +/// under memory pressure. +/// +/// It's okay for `use_range` and `stop_using_range` to race with each oth= er, although there's no +/// way to know whether an index ends up with true or false if a call to `= use_range` races with +/// another call to `stop_using_range` on a given index. +/// +/// It's also okay for the two methods to race with themselves, e.g. if tw= o threads call +/// `use_range` on the same index, then that's fine and neither call will = return until the page is +/// allocated and mapped. +/// +/// The methods that read or write to a range require that the page is mar= ked as in use. So it is +/// _not_ okay to call `stop_using_range` on a page that is in use by the = methods that read or +/// write to the page. +#[pin_data(PinnedDrop)] +pub(crate) struct ShrinkablePageRange { + /// Shrinker object registered with the kernel. + shrinker: &'static Shrinker, + /// Pid using this page range. Only used as debugging information. + pid: Pid, + /// The mm for the relevant process. + mm: ARef, + /// Used to synchronize calls to `vm_insert_page` and `zap_page_range_= single`. + #[pin] + mm_lock: Mutex<()>, + /// Spinlock protecting changes to pages. + #[pin] + lock: SpinLock, + + /// Must not move, since page info has pointers back. + #[pin] + _pin: PhantomPinned, +} + +struct Inner { + /// Array of pages. + /// + /// Since this is also accessed by the shrinker, we can't use a `Box`,= which asserts exclusive + /// ownership. To deal with that, we manage it using raw pointers. + pages: *mut PageInfo, + /// Length of the `pages` array. + size: usize, + /// The address of the vma to insert the pages into. + vma_addr: usize, +} + +// SAFETY: proper locking is in place for `Inner` +unsafe impl Send for Inner {} + +type StableMmGuard =3D + kernel::sync::lock::Guard<'static, (), kernel::sync::lock::mutex::Mute= xBackend>; + +/// An array element that describes the current state of a page. +/// +/// There are three states: +/// +/// * Free. The page is None. The `lru` element is not queued. +/// * Available. The page is Some. The `lru` element is queued to the shr= inker's lru. +/// * Used. The page is Some. The `lru` element is not queued. +/// +/// When an element is available, the shrinker is able to free the page. +#[repr(C)] +struct PageInfo { + lru: bindings::list_head, + page: Option, + range: *const ShrinkablePageRange, +} + +impl PageInfo { + /// # Safety + /// + /// The caller ensures that writing to `me.page` is ok, and that the p= age is not currently set. + unsafe fn set_page(me: *mut PageInfo, page: Page) { + // SAFETY: This pointer offset is in bounds. + let ptr =3D unsafe { ptr::addr_of_mut!((*me).page) }; + + // SAFETY: The pointer is valid for writing, so also valid for rea= ding. + if unsafe { (*ptr).is_some() } { + pr_err!("set_page called when there is already a page"); + // SAFETY: We will initialize the page again below. + unsafe { ptr::drop_in_place(ptr) }; + } + + // SAFETY: The pointer is valid for writing. + unsafe { ptr::write(ptr, Some(page)) }; + } + + /// # Safety + /// + /// The caller ensures that reading from `me.page` is ok for the durat= ion of 'a. + unsafe fn get_page<'a>(me: *const PageInfo) -> Option<&'a Page> { + // SAFETY: This pointer offset is in bounds. + let ptr =3D unsafe { ptr::addr_of!((*me).page) }; + + // SAFETY: The pointer is valid for reading. + unsafe { (*ptr).as_ref() } + } + + /// # Safety + /// + /// The caller ensures that writing to `me.page` is ok for the duratio= n of 'a. + unsafe fn take_page(me: *mut PageInfo) -> Option { + // SAFETY: This pointer offset is in bounds. + let ptr =3D unsafe { ptr::addr_of_mut!((*me).page) }; + + // SAFETY: The pointer is valid for reading. + unsafe { (*ptr).take() } + } + + /// Add this page to the lru list, if not already in the list. + /// + /// # Safety + /// + /// The pointer must be valid, and it must be the right shrinker and n= id. + unsafe fn list_lru_add(me: *mut PageInfo, nid: i32, shrinker: &'static= Shrinker) { + // SAFETY: This pointer offset is in bounds. + let lru_ptr =3D unsafe { ptr::addr_of_mut!((*me).lru) }; + // SAFETY: The lru pointer is valid, and we're not using it with a= ny other lru list. + unsafe { bindings::list_lru_add(shrinker.list_lru.get(), lru_ptr, = nid, ptr::null_mut()) }; + } + + /// Remove this page from the lru list, if it is in the list. + /// + /// # Safety + /// + /// The pointer must be valid, and it must be the right shrinker and n= id. + unsafe fn list_lru_del(me: *mut PageInfo, nid: i32, shrinker: &'static= Shrinker) { + // SAFETY: This pointer offset is in bounds. + let lru_ptr =3D unsafe { ptr::addr_of_mut!((*me).lru) }; + // SAFETY: The lru pointer is valid, and we're not using it with a= ny other lru list. + unsafe { bindings::list_lru_del(shrinker.list_lru.get(), lru_ptr, = nid, ptr::null_mut()) }; + } +} + +impl ShrinkablePageRange { + /// Create a new `ShrinkablePageRange` using the given shrinker. + pub(crate) fn new(shrinker: &'static Shrinker) -> impl PinInit { + try_pin_init!(Self { + shrinker, + pid: kernel::current!().pid(), + mm: ARef::from(&**kernel::current!().mm().ok_or(ESRCH)?), + mm_lock <- new_mutex!((), "ShrinkablePageRange::mm"), + lock <- new_spinlock!(Inner { + pages: ptr::null_mut(), + size: 0, + vma_addr: 0, + }, "ShrinkablePageRange"), + _pin: PhantomPinned, + }) + } + + pub(crate) fn stable_trylock_mm(&self) -> Option { + // SAFETY: This extends the duration of the reference. Since this = call happens before + // `mm_lock` is taken in the destructor of `ShrinkablePageRange`, = the destructor will block + // until the returned guard is dropped. This ensures that the guar= d is valid until dropped. + let mm_lock =3D unsafe { &*ptr::from_ref(&self.mm_lock) }; + + mm_lock.try_lock() + } + + /// Register a vma with this page range. Returns the size of the regio= n. + pub(crate) fn register_with_vma(&self, vma: &virt::VmaNew) -> Result { + let num_bytes =3D usize::min(vma.end() - vma.start(), bindings::SZ= _4M as usize); + let num_pages =3D num_bytes >> PAGE_SHIFT; + + if !ptr::eq::(&*self.mm, &**vma.mm()) { + pr_debug!("Failed to register with vma: invalid vma->vm_mm"); + return Err(EINVAL); + } + if num_pages =3D=3D 0 { + pr_debug!("Failed to register with vma: size zero"); + return Err(EINVAL); + } + + let layout =3D Layout::array::(num_pages).map_err(|_| EN= OMEM)?; + let pages =3D KVmalloc::alloc(layout, GFP_KERNEL)?.cast::(); + + // SAFETY: This just initializes the pages array. + unsafe { + let self_ptr =3D self as *const ShrinkablePageRange; + for i in 0..num_pages { + let info =3D pages.as_ptr().add(i); + ptr::addr_of_mut!((*info).range).write(self_ptr); + ptr::addr_of_mut!((*info).page).write(None); + let lru =3D ptr::addr_of_mut!((*info).lru); + ptr::addr_of_mut!((*lru).next).write(lru); + ptr::addr_of_mut!((*lru).prev).write(lru); + } + } + + let mut inner =3D self.lock.lock(); + if inner.size > 0 { + pr_debug!("Failed to register with vma: already registered"); + drop(inner); + // SAFETY: The `pages` array was allocated with the same layou= t. + unsafe { KVmalloc::free(pages.cast(), layout) }; + return Err(EBUSY); + } + + inner.pages =3D pages.as_ptr(); + inner.size =3D num_pages; + inner.vma_addr =3D vma.start(); + + Ok(num_pages) + } + + /// Make sure that the given pages are allocated and mapped. + /// + /// Must not be called from an atomic context. + pub(crate) fn use_range(&self, start: usize, end: usize) -> Result<()>= { + if start >=3D end { + return Ok(()); + } + let mut inner =3D self.lock.lock(); + assert!(end <=3D inner.size); + + for i in start..end { + // SAFETY: This pointer offset is in bounds. + let page_info =3D unsafe { inner.pages.add(i) }; + + // SAFETY: The pointer is valid, and we hold the lock so readi= ng from the page is okay. + if let Some(page) =3D unsafe { PageInfo::get_page(page_info) }= { + // Since we're going to use the page, we should remove it = from the lru list so that + // the shrinker will not free it. + // + // SAFETY: The pointer is valid, and this is the right shr= inker. + // + // The shrinker can't free the page between the check and = this call to + // `list_lru_del` because we hold the lock. + unsafe { PageInfo::list_lru_del(page_info, page.nid(), sel= f.shrinker) }; + } else { + // We have to allocate a new page. Use the slow path. + drop(inner); + // SAFETY: `i < end <=3D inner.size` so `i` is in bounds. + match unsafe { self.use_page_slow(i) } { + Ok(()) =3D> {} + Err(err) =3D> { + pr_warn!("Error in use_page_slow: {:?}", err); + return Err(err); + } + } + inner =3D self.lock.lock(); + } + } + Ok(()) + } + + /// Mark the given page as in use, slow path. + /// + /// Must not be called from an atomic context. + /// + /// # Safety + /// + /// Assumes that `i` is in bounds. + #[cold] + unsafe fn use_page_slow(&self, i: usize) -> Result<()> { + let new_page =3D Page::alloc_page(GFP_KERNEL | __GFP_HIGHMEM | __G= FP_ZERO)?; + + let mm_mutex =3D self.mm_lock.lock(); + let inner =3D self.lock.lock(); + + // SAFETY: This pointer offset is in bounds. + let page_info =3D unsafe { inner.pages.add(i) }; + + // SAFETY: The pointer is valid, and we hold the lock so reading f= rom the page is okay. + if let Some(page) =3D unsafe { PageInfo::get_page(page_info) } { + // The page was already there, or someone else added the page = while we didn't hold the + // spinlock. + // + // SAFETY: The pointer is valid, and this is the right shrinke= r. + // + // The shrinker can't free the page between the check and this= call to + // `list_lru_del` because we hold the lock. + unsafe { PageInfo::list_lru_del(page_info, page.nid(), self.sh= rinker) }; + return Ok(()); + } + + let vma_addr =3D inner.vma_addr; + // Release the spinlock while we insert the page into the vma. + drop(inner); + + // No overflow since we stay in bounds of the vma. + let user_page_addr =3D vma_addr + (i << PAGE_SHIFT); + + // We use `mmput_async` when dropping the `mm` because `use_page_s= low` is usually used from + // a remote process. If the call to `mmput` races with the process= shutting down, then the + // caller of `use_page_slow` becomes responsible for cleaning up t= he `mm`, which doesn't + // happen until it returns to userspace. However, the caller might= instead go to sleep and + // wait for the owner of the `mm` to wake it up, which doesn't hap= pen because it's in the + // middle of a shutdown process that won't complete until the `mm`= is dropped. This can + // amount to a deadlock. + // + // Using `mmput_async` avoids this, because then the `mm` cleanup = is instead queued to a + // workqueue. + MmWithUser::into_mmput_async(self.mm.mmget_not_zero().ok_or(ESRCH)= ?) + .mmap_read_lock() + .vma_lookup(vma_addr) + .ok_or(ESRCH)? + .as_mixedmap_vma() + .ok_or(ESRCH)? + .vm_insert_page(user_page_addr, &new_page) + .inspect_err(|err| { + pr_warn!( + "Failed to vm_insert_page({}): vma_addr:{} i:{} err:{:= ?}", + user_page_addr, + vma_addr, + i, + err + ) + })?; + + let inner =3D self.lock.lock(); + + // SAFETY: The `page_info` pointer is valid and currently does not= have a page. The page + // can be written to since we hold the lock. + // + // We released and reacquired the spinlock since we checked that t= he page is null, but we + // always hold the mm_lock mutex when setting the page to a non-nu= ll value, so it's not + // possible for someone else to have changed it since our check. + unsafe { PageInfo::set_page(page_info, new_page) }; + + drop(inner); + drop(mm_mutex); + + Ok(()) + } + + /// If the given page is in use, then mark it as available so that the= shrinker can free it. + /// + /// May be called from an atomic context. + pub(crate) fn stop_using_range(&self, start: usize, end: usize) { + if start >=3D end { + return; + } + let inner =3D self.lock.lock(); + assert!(end <=3D inner.size); + + for i in (start..end).rev() { + // SAFETY: The pointer is in bounds. + let page_info =3D unsafe { inner.pages.add(i) }; + + // SAFETY: Okay for reading since we have the lock. + if let Some(page) =3D unsafe { PageInfo::get_page(page_info) }= { + // SAFETY: The pointer is valid, and it's the right shrink= er. + unsafe { PageInfo::list_lru_add(page_info, page.nid(), sel= f.shrinker) }; + } + } + } + + /// Helper for reading or writing to a range of bytes that may overlap= with several pages. + /// + /// # Safety + /// + /// All pages touched by this operation must be in use for the duratio= n of this call. + unsafe fn iterate(&self, mut offset: usize, mut size: usize, mut cb= : T) -> Result + where + T: FnMut(&Page, usize, usize) -> Result, + { + if size =3D=3D 0 { + return Ok(()); + } + + let (pages, num_pages) =3D { + let inner =3D self.lock.lock(); + (inner.pages, inner.size) + }; + let num_bytes =3D num_pages << PAGE_SHIFT; + + // Check that the request is within the buffer. + if offset.checked_add(size).ok_or(EFAULT)? > num_bytes { + return Err(EFAULT); + } + + let mut page_index =3D offset >> PAGE_SHIFT; + offset &=3D PAGE_SIZE - 1; + while size > 0 { + let available =3D usize::min(size, PAGE_SIZE - offset); + // SAFETY: The pointer is in bounds. + let page_info =3D unsafe { pages.add(page_index) }; + // SAFETY: The caller guarantees that this page is in the "in = use" state for the + // duration of this call to `iterate`, so nobody will change t= he page. + let page =3D unsafe { PageInfo::get_page(page_info) }; + if page.is_none() { + pr_warn!("Page is null!"); + } + let page =3D page.ok_or(EFAULT)?; + cb(page, offset, available)?; + size -=3D available; + page_index +=3D 1; + offset =3D 0; + } + Ok(()) + } + + /// Copy from userspace into this page range. + /// + /// # Safety + /// + /// All pages touched by this operation must be in use for the duratio= n of this call. + pub(crate) unsafe fn copy_from_user_slice( + &self, + reader: &mut UserSliceReader, + offset: usize, + size: usize, + ) -> Result { + // SAFETY: `self.iterate` has the same safety requirements as `cop= y_from_user_slice`. + unsafe { + self.iterate(offset, size, |page, offset, to_copy| { + page.copy_from_user_slice_raw(reader, offset, to_copy) + }) + } + } + + /// Copy from this page range into kernel space. + /// + /// # Safety + /// + /// All pages touched by this operation must be in use for the duratio= n of this call. + pub(crate) unsafe fn read(&self, offset: usize) -> Resul= t { + let mut out =3D MaybeUninit::::uninit(); + let mut out_offset =3D 0; + // SAFETY: `self.iterate` has the same safety requirements as `rea= d`. + unsafe { + self.iterate(offset, size_of::(), |page, offset, to_copy| { + // SAFETY: The sum of `offset` and `to_copy` is bounded by= the size of T. + let obj_ptr =3D (out.as_mut_ptr() as *mut u8).add(out_offs= et); + // SAFETY: The pointer points is in-bounds of the `out` va= riable, so it is valid. + page.read_raw(obj_ptr, offset, to_copy)?; + out_offset +=3D to_copy; + Ok(()) + })?; + } + // SAFETY: We just initialised the data. + Ok(unsafe { out.assume_init() }) + } + + /// Copy from kernel space into this page range. + /// + /// # Safety + /// + /// All pages touched by this operation must be in use for the duratio= n of this call. + pub(crate) unsafe fn write(&self, offset: usize, obj: &T) -= > Result { + let mut obj_offset =3D 0; + // SAFETY: `self.iterate` has the same safety requirements as `wri= te`. + unsafe { + self.iterate(offset, size_of_val(obj), |page, offset, to_copy|= { + // SAFETY: The sum of `offset` and `to_copy` is bounded by= the size of T. + let obj_ptr =3D (obj as *const T as *const u8).add(obj_off= set); + // SAFETY: We have a reference to the object, so the point= er is valid. + page.write_raw(obj_ptr, offset, to_copy)?; + obj_offset +=3D to_copy; + Ok(()) + }) + } + } + + /// Write zeroes to the given range. + /// + /// # Safety + /// + /// All pages touched by this operation must be in use for the duratio= n of this call. + pub(crate) unsafe fn fill_zero(&self, offset: usize, size: usize) -> R= esult { + // SAFETY: `self.iterate` has the same safety requirements as `cop= y_into`. + unsafe { + self.iterate(offset, size, |page, offset, len| { + page.fill_zero_raw(offset, len) + }) + } + } +} + +#[pinned_drop] +impl PinnedDrop for ShrinkablePageRange { + fn drop(self: Pin<&mut Self>) { + let (pages, size) =3D { + let lock =3D self.lock.lock(); + (lock.pages, lock.size) + }; + + if size =3D=3D 0 { + return; + } + + // Note: This call is also necessary for the safety of `stable_try= lock_mm`. + let mm_lock =3D self.mm_lock.lock(); + + // This is the destructor, so unlike the other methods, we only ne= ed to worry about races + // with the shrinker here. Since we hold the mm_lock, we also can'= t race with the shrinker. + for i in 0..size { + // SAFETY: Loop is in-bounds of the size. + let p_ptr =3D unsafe { pages.add(i) }; + // SAFETY: No other readers, so we can read. + if let Some(p) =3D unsafe { PageInfo::get_page(p_ptr) } { + // SAFETY: The pointer is valid and it's the right shrinke= r. + unsafe { PageInfo::list_lru_del(p_ptr, p.nid(), self.shrin= ker) }; + // SAFETY: No other readers, so we can write. + unsafe { drop(PageInfo::take_page(p_ptr)) }; + } + } + + drop(mm_lock); + + let Some(pages) =3D NonNull::new(pages) else { + return; + }; + + // SAFETY: This computation did not overflow when allocating the p= ages array, so it will + // not overflow this time. + let layout =3D unsafe { Layout::array::(size).unwrap_unc= hecked() }; + + // SAFETY: The `pages` array was allocated with the same layout. + unsafe { KVmalloc::free(pages.cast(), layout) }; + } +} + +/// # Safety +/// Called by the shrinker. +#[no_mangle] +unsafe extern "C" fn rust_shrink_count( + shrink: *mut bindings::shrinker, + _sc: *mut bindings::shrink_control, +) -> c_ulong { + // SAFETY: We can access our own private data. + let list_lru =3D unsafe { (*shrink).private_data.cast::() }; + // SAFETY: Accessing the lru list is okay. Just an FFI call. + unsafe { bindings::list_lru_count(list_lru) } +} + +/// # Safety +/// Called by the shrinker. +#[no_mangle] +unsafe extern "C" fn rust_shrink_scan( + shrink: *mut bindings::shrinker, + sc: *mut bindings::shrink_control, +) -> c_ulong { + // SAFETY: We can access our own private data. + let list_lru =3D unsafe { (*shrink).private_data.cast::() }; + // SAFETY: Caller guarantees that it is safe to read this field. + let nr_to_scan =3D unsafe { (*sc).nr_to_scan }; + // SAFETY: Accessing the lru list is okay. Just an FFI call. + unsafe { + bindings::list_lru_walk( + list_lru, + Some(bindings::rust_shrink_free_page_wrap), + ptr::null_mut(), + nr_to_scan, + ) + } +} + +const LRU_SKIP: bindings::lru_status =3D bindings::lru_status_LRU_SKIP; +const LRU_REMOVED_ENTRY: bindings::lru_status =3D bindings::lru_status_LRU= _REMOVED_RETRY; + +/// # Safety +/// Called by the shrinker. +#[no_mangle] +unsafe extern "C" fn rust_shrink_free_page( + item: *mut bindings::list_head, + lru: *mut bindings::list_lru_one, + _cb_arg: *mut c_void, +) -> bindings::lru_status { + // Fields that should survive after unlocking the lru lock. + let page; + let page_index; + let mm; + let mmap_read; + let mm_mutex; + let vma_addr; + + { + // CAST: The `list_head` field is first in `PageInfo`. + let info =3D item as *mut PageInfo; + // SAFETY: The `range` field of `PageInfo` is immutable. + let range =3D unsafe { &*((*info).range) }; + + mm =3D match range.mm.mmget_not_zero() { + Some(mm) =3D> MmWithUser::into_mmput_async(mm), + None =3D> return LRU_SKIP, + }; + + mm_mutex =3D match range.stable_trylock_mm() { + Some(guard) =3D> guard, + None =3D> return LRU_SKIP, + }; + + mmap_read =3D match mm.mmap_read_trylock() { + Some(guard) =3D> guard, + None =3D> return LRU_SKIP, + }; + + // We can't lock it normally here, since we hold the lru lock. + let inner =3D match range.lock.try_lock() { + Some(inner) =3D> inner, + None =3D> return LRU_SKIP, + }; + + // SAFETY: The item is in this lru list, so it's okay to remove it. + unsafe { bindings::list_lru_isolate(lru, item) }; + + // SAFETY: Both pointers are in bounds of the same allocation. + page_index =3D unsafe { info.offset_from(inner.pages) } as usize; + + // SAFETY: We hold the spinlock, so we can take the page. + // + // This sets the page pointer to zero before we unmap it from the = vma. However, we call + // `zap_page_range` before we release the mmap lock, so `use_page_= slow` will not be able to + // insert a new page until after our call to `zap_page_range`. + page =3D unsafe { PageInfo::take_page(info) }; + vma_addr =3D inner.vma_addr; + + // From this point on, we don't access this PageInfo or Shrinkable= PageRange again, because + // they can be freed at any point after we unlock `lru_lock`. This= is with the exception of + // `mm_mutex` which is kept alive by holding the lock. + } + + // SAFETY: The lru lock is locked when this method is called. + unsafe { bindings::spin_unlock(&raw mut (*lru).lock) }; + + if let Some(vma) =3D mmap_read.vma_lookup(vma_addr) { + let user_page_addr =3D vma_addr + (page_index << PAGE_SHIFT); + vma.zap_page_range_single(user_page_addr, PAGE_SIZE); + } + + drop(mmap_read); + drop(mm_mutex); + drop(mm); + drop(page); + + // SAFETY: We just unlocked the lru lock, but it should be locked when= we return. + unsafe { bindings::spin_lock(&raw mut (*lru).lock) }; + + LRU_REMOVED_ENTRY +} diff --git a/drivers/android/binder/page_range_helper.c b/drivers/android/b= inder/page_range_helper.c new file mode 100644 index 0000000000000000000000000000000000000000..496887723ee003e910d6ce67dba= dd8c5286e39d1 --- /dev/null +++ b/drivers/android/binder/page_range_helper.c @@ -0,0 +1,24 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* C helper for page_range.rs to work around a CFI violation. + * + * Bindgen currently pretends that `enum lru_status` is the same as an int= eger. + * This assumption is fine ABI-wise, but once you add CFI to the mix, it + * triggers a CFI violation because `enum lru_status` gets a different CFI= tag. + * + * This file contains a workaround until bindgen can be fixed. + * + * Copyright (C) 2025 Google LLC. + */ +#include "page_range_helper.h" + +unsigned int rust_shrink_free_page(struct list_head *item, + struct list_lru_one *list, + void *cb_arg); + +enum lru_status +rust_shrink_free_page_wrap(struct list_head *item, struct list_lru_one *li= st, + void *cb_arg) +{ + return rust_shrink_free_page(item, list, cb_arg); +} diff --git a/drivers/android/binder/page_range_helper.h b/drivers/android/b= inder/page_range_helper.h new file mode 100644 index 0000000000000000000000000000000000000000..18dd2dd117b253fcbac735b4803= 2b8f2d53d11fe --- /dev/null +++ b/drivers/android/binder/page_range_helper.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2025 Google, Inc. + */ + +#ifndef _LINUX_PAGE_RANGE_HELPER_H +#define _LINUX_PAGE_RANGE_HELPER_H + +#include + +enum lru_status +rust_shrink_free_page_wrap(struct list_head *item, struct list_lru_one *li= st, + void *cb_arg); + +#endif /* _LINUX_PAGE_RANGE_HELPER_H */ diff --git a/drivers/android/binder/process.rs b/drivers/android/binder/pro= cess.rs new file mode 100644 index 0000000000000000000000000000000000000000..f13a747e784c84a0fb09cbf4744= 2712106eba07c --- /dev/null +++ b/drivers/android/binder/process.rs @@ -0,0 +1,1696 @@ +// SPDX-License-Identifier: GPL-2.0 + +// Copyright (C) 2025 Google LLC. + +//! This module defines the `Process` type, which represents a process usi= ng a particular binder +//! context. +//! +//! The `Process` object keeps track of all of the resources that this pro= cess owns in the binder +//! context. +//! +//! There is one `Process` object for each binder fd that a process has op= ened, so processes using +//! several binder contexts have several `Process` objects. This ensures t= hat the contexts are +//! fully separated. + +use core::mem::take; + +use kernel::{ + bindings, + cred::Credential, + error::Error, + fs::file::{self, File}, + list::{List, ListArc, ListArcField, ListLinks}, + mm, + prelude::*, + rbtree::{self, RBTree, RBTreeNode, RBTreeNodeReservation}, + seq_file::SeqFile, + seq_print, + sync::poll::PollTable, + sync::{ + lock::{spinlock::SpinLockBackend, Guard}, + Arc, ArcBorrow, CondVar, CondVarTimeoutResult, Mutex, SpinLock, Un= iqueArc, + }, + task::Task, + types::ARef, + uaccess::{UserSlice, UserSliceReader}, + uapi, + workqueue::{self, Work}, +}; + +use crate::{ + allocation::{Allocation, AllocationInfo, NewAllocation}, + context::Context, + defs::*, + error::{BinderError, BinderResult}, + node::{CouldNotDeliverCriticalIncrement, CritIncrWrapper, Node, NodeDe= ath, NodeRef}, + page_range::ShrinkablePageRange, + range_alloc::{RangeAllocator, ReserveNew, ReserveNewArgs}, + stats::BinderStats, + thread::{PushWorkRes, Thread}, + BinderfsProcFile, DArc, DLArc, DTRWrap, DeliverToRead, +}; + +#[path =3D "freeze.rs"] +mod freeze; +use self::freeze::{FreezeCookie, FreezeListener}; + +struct Mapping { + address: usize, + alloc: RangeAllocator, +} + +impl Mapping { + fn new(address: usize, size: usize) -> Self { + Self { + address, + alloc: RangeAllocator::new(size), + } + } +} + +// bitflags for defer_work. +const PROC_DEFER_FLUSH: u8 =3D 1; +const PROC_DEFER_RELEASE: u8 =3D 2; + +/// The fields of `Process` protected by the spinlock. +pub(crate) struct ProcessInner { + is_manager: bool, + pub(crate) is_dead: bool, + threads: RBTree>, + /// INVARIANT: Threads pushed to this list must be owned by this proce= ss. + ready_threads: List, + nodes: RBTree>, + mapping: Option, + work: List>, + delivered_deaths: List, 2>, + + /// The number of requested threads that haven't registered yet. + requested_thread_count: u32, + /// The maximum number of threads used by the process thread pool. + max_threads: u32, + /// The number of threads the started and registered with the thread p= ool. + started_thread_count: u32, + + /// Bitmap of deferred work to do. + defer_work: u8, + + /// Number of transactions to be transmitted before processes in freez= e_wait + /// are woken up. + outstanding_txns: u32, + /// Process is frozen and unable to service binder transactions. + pub(crate) is_frozen: bool, + /// Process received sync transactions since last frozen. + pub(crate) sync_recv: bool, + /// Process received async transactions since last frozen. + pub(crate) async_recv: bool, + pub(crate) binderfs_file: Option, + /// Check for oneway spam + oneway_spam_detection_enabled: bool, +} + +impl ProcessInner { + fn new() -> Self { + Self { + is_manager: false, + is_dead: false, + threads: RBTree::new(), + ready_threads: List::new(), + mapping: None, + nodes: RBTree::new(), + work: List::new(), + delivered_deaths: List::new(), + requested_thread_count: 0, + max_threads: 0, + started_thread_count: 0, + defer_work: 0, + outstanding_txns: 0, + is_frozen: false, + sync_recv: false, + async_recv: false, + binderfs_file: None, + oneway_spam_detection_enabled: false, + } + } + + /// Schedule the work item for execution on this process. + /// + /// If any threads are ready for work, then the work item is given dir= ectly to that thread and + /// it is woken up. Otherwise, it is pushed to the process work list. + /// + /// This call can fail only if the process is dead. In this case, the = work item is returned to + /// the caller so that the caller can drop it after releasing the inne= r process lock. This is + /// necessary since the destructor of `Transaction` will take locks th= at can't necessarily be + /// taken while holding the inner process lock. + pub(crate) fn push_work( + &mut self, + work: DLArc, + ) -> Result<(), (BinderError, DLArc)> { + // Try to find a ready thread to which to push the work. + if let Some(thread) =3D self.ready_threads.pop_front() { + // Push to thread while holding state lock. This prevents the = thread from giving up + // (for example, because of a signal) when we're about to deli= ver work. + match thread.push_work(work) { + PushWorkRes::Ok =3D> Ok(()), + PushWorkRes::FailedDead(work) =3D> Err((BinderError::new_d= ead(), work)), + } + } else if self.is_dead { + Err((BinderError::new_dead(), work)) + } else { + let sync =3D work.should_sync_wakeup(); + + // Didn't find a thread waiting for proc work; this can happen + // in two scenarios: + // 1. All threads are busy handling transactions + // In that case, one of those threads should call back into + // the kernel driver soon and pick up this work. + // 2. Threads are using the (e)poll interface, in which case + // they may be blocked on the waitqueue without having been + // added to waiting_threads. For this case, we just iterate + // over all threads not handling transaction work, and + // wake them all up. We wake all because we don't know whet= her + // a thread that called into (e)poll is handling non-binder + // work currently. + self.work.push_back(work); + + // Wake up polling threads, if any. + for thread in self.threads.values() { + thread.notify_if_poll_ready(sync); + } + + Ok(()) + } + } + + pub(crate) fn remove_node(&mut self, ptr: u64) { + self.nodes.remove(&ptr); + } + + /// Updates the reference count on the given node. + pub(crate) fn update_node_refcount( + &mut self, + node: &DArc, + inc: bool, + strong: bool, + count: usize, + othread: Option<&Thread>, + ) { + let push =3D node.update_refcount_locked(inc, strong, count, self); + + // If we decided that we need to push work, push either to the pro= cess or to a thread if + // one is specified. + if let Some(node) =3D push { + if let Some(thread) =3D othread { + thread.push_work_deferred(node); + } else { + let _ =3D self.push_work(node); + // Nothing to do: `push_work` may fail if the process is d= ead, but that's ok as in + // that case, it doesn't care about the notification. + } + } + } + + pub(crate) fn new_node_ref( + &mut self, + node: DArc, + strong: bool, + thread: Option<&Thread>, + ) -> NodeRef { + self.update_node_refcount(&node, true, strong, 1, thread); + let strong_count =3D if strong { 1 } else { 0 }; + NodeRef::new(node, strong_count, 1 - strong_count) + } + + pub(crate) fn new_node_ref_with_thread( + &mut self, + node: DArc, + strong: bool, + thread: &Thread, + wrapper: Option, + ) -> Result { + let push =3D match wrapper { + None =3D> node + .incr_refcount_allow_zero2one(strong, self)? + .map(|node| node as _), + Some(wrapper) =3D> node.incr_refcount_allow_zero2one_with_wrap= per(strong, wrapper, self), + }; + if let Some(node) =3D push { + thread.push_work_deferred(node); + } + let strong_count =3D if strong { 1 } else { 0 }; + Ok(NodeRef::new(node, strong_count, 1 - strong_count)) + } + + /// Returns an existing node with the given pointer and cookie, if one= exists. + /// + /// Returns an error if a node with the given pointer but a different = cookie exists. + fn get_existing_node(&self, ptr: u64, cookie: u64) -> Result>> { + match self.nodes.get(&ptr) { + None =3D> Ok(None), + Some(node) =3D> { + let (_, node_cookie) =3D node.get_id(); + if node_cookie =3D=3D cookie { + Ok(Some(node.clone())) + } else { + Err(EINVAL) + } + } + } + } + + fn register_thread(&mut self) -> bool { + if self.requested_thread_count =3D=3D 0 { + return false; + } + + self.requested_thread_count -=3D 1; + self.started_thread_count +=3D 1; + true + } + + /// Finds a delivered death notification with the given cookie, remove= s it from the thread's + /// delivered list, and returns it. + fn pull_delivered_death(&mut self, cookie: u64) -> Option> { + let mut cursor =3D self.delivered_deaths.cursor_front(); + while let Some(next) =3D cursor.peek_next() { + if next.cookie =3D=3D cookie { + return Some(next.remove().into_arc()); + } + cursor.move_next(); + } + None + } + + pub(crate) fn death_delivered(&mut self, death: DArc) { + if let Some(death) =3D ListArc::try_from_arc_or_drop(death) { + self.delivered_deaths.push_back(death); + } else { + pr_warn!("Notification added to `delivered_deaths` twice."); + } + } + + pub(crate) fn add_outstanding_txn(&mut self) { + self.outstanding_txns +=3D 1; + } + + fn txns_pending_locked(&self) -> bool { + if self.outstanding_txns > 0 { + return true; + } + for thread in self.threads.values() { + if thread.has_current_transaction() { + return true; + } + } + false + } +} + +/// Used to keep track of a node that this process has a handle to. +#[pin_data] +pub(crate) struct NodeRefInfo { + debug_id: usize, + /// The refcount that this process owns to the node. + node_ref: ListArcField, + death: ListArcField>, { Self::LIST_PROC }>, + /// Cookie of the active freeze listener for this node. + freeze: ListArcField, { Self::LIST_PROC }>, + /// Used to store this `NodeRefInfo` in the node's `refs` list. + #[pin] + links: ListLinks<{ Self::LIST_NODE }>, + /// The handle for this `NodeRefInfo`. + handle: u32, + /// The process that has a handle to the node. + pub(crate) process: Arc, +} + +impl NodeRefInfo { + /// The id used for the `Node::refs` list. + pub(crate) const LIST_NODE: u64 =3D 0x2da16350fb724a10; + /// The id used for the `ListArc` in `ProcessNodeRefs`. + const LIST_PROC: u64 =3D 0xd703a5263dcc8650; + + fn new(node_ref: NodeRef, handle: u32, process: Arc) -> impl = PinInit { + pin_init!(Self { + debug_id: super::next_debug_id(), + node_ref: ListArcField::new(node_ref), + death: ListArcField::new(None), + freeze: ListArcField::new(None), + links <- ListLinks::new(), + handle, + process, + }) + } + + kernel::list::define_list_arc_field_getter! { + pub(crate) fn death(&mut self<{Self::LIST_PROC}>) -> &mut Option> { death } + pub(crate) fn freeze(&mut self<{Self::LIST_PROC}>) -> &mut Option<= FreezeCookie> { freeze } + pub(crate) fn node_ref(&mut self<{Self::LIST_PROC}>) -> &mut NodeR= ef { node_ref } + pub(crate) fn node_ref2(&self<{Self::LIST_PROC}>) -> &NodeRef { no= de_ref } + } +} + +kernel::list::impl_list_arc_safe! { + impl ListArcSafe<{Self::LIST_NODE}> for NodeRefInfo { untracked; } + impl ListArcSafe<{Self::LIST_PROC}> for NodeRefInfo { untracked; } +} +kernel::list::impl_list_item! { + impl ListItem<{Self::LIST_NODE}> for NodeRefInfo { + using ListLinks { self.links }; + } +} + +/// Keeps track of references this process has to nodes owned by other pro= cesses. +/// +/// TODO: Currently, the rbtree requires two allocations per node referenc= e, and two tree +/// traversals to look up a node by `Node::global_id`. Once the rbtree is = more powerful, these +/// extra costs should be eliminated. +struct ProcessNodeRefs { + /// Used to look up nodes using the 32-bit id that this process knows = it by. + by_handle: RBTree>, + /// Used to look up nodes without knowing their local 32-bit id. The u= size is the address of + /// the underlying `Node` struct as returned by `Node::global_id`. + by_node: RBTree, + /// Used to look up a `FreezeListener` by cookie. + /// + /// There might be multiple freeze listeners for the same node, but at= most one of them is + /// active. + freeze_listeners: RBTree, +} + +impl ProcessNodeRefs { + fn new() -> Self { + Self { + by_handle: RBTree::new(), + by_node: RBTree::new(), + freeze_listeners: RBTree::new(), + } + } +} + +/// A process using binder. +/// +/// Strictly speaking, there can be multiple of these per process. There i= s one for each binder fd +/// that a process has opened, so processes using several binder contexts = have several `Process` +/// objects. This ensures that the contexts are fully separated. +#[pin_data] +pub(crate) struct Process { + pub(crate) ctx: Arc, + + // The task leader (process). + pub(crate) task: ARef, + + // Credential associated with file when `Process` is created. + pub(crate) cred: ARef, + + #[pin] + pub(crate) inner: SpinLock, + + #[pin] + pub(crate) pages: ShrinkablePageRange, + + // Waitqueue of processes waiting for all outstanding transactions to = be + // processed. + #[pin] + freeze_wait: CondVar, + + // Node references are in a different lock to avoid recursive acquisit= ion when + // incrementing/decrementing a node in another process. + #[pin] + node_refs: Mutex, + + // Work node for deferred work item. + #[pin] + defer_work: Work, + + // Links for process list in Context. + #[pin] + links: ListLinks, + + pub(crate) stats: BinderStats, +} + +kernel::impl_has_work! { + impl HasWork for Process { self.defer_work } +} + +kernel::list::impl_list_arc_safe! { + impl ListArcSafe<0> for Process { untracked; } +} +kernel::list::impl_list_item! { + impl ListItem<0> for Process { + using ListLinks { self.links }; + } +} + +impl workqueue::WorkItem for Process { + type Pointer =3D Arc; + + fn run(me: Arc) { + let defer; + { + let mut inner =3D me.inner.lock(); + defer =3D inner.defer_work; + inner.defer_work =3D 0; + } + + if defer & PROC_DEFER_FLUSH !=3D 0 { + me.deferred_flush(); + } + if defer & PROC_DEFER_RELEASE !=3D 0 { + me.deferred_release(); + } + } +} + +impl Process { + fn new(ctx: Arc, cred: ARef) -> Result>= { + let current =3D kernel::current!(); + let list_process =3D ListArc::pin_init::( + try_pin_init!(Process { + ctx, + cred, + inner <- kernel::new_spinlock!(ProcessInner::new(), "Proce= ss::inner"), + pages <- ShrinkablePageRange::new(&super::BINDER_SHRINKER), + node_refs <- kernel::new_mutex!(ProcessNodeRefs::new(), "P= rocess::node_refs"), + freeze_wait <- kernel::new_condvar!("Process::freeze_wait"= ), + task: current.group_leader().into(), + defer_work <- kernel::new_work!("Process::defer_work"), + links <- ListLinks::new(), + stats: BinderStats::new(), + }), + GFP_KERNEL, + )?; + + let process =3D list_process.clone_arc(); + process.ctx.register_process(list_process); + + Ok(process) + } + + pub(crate) fn pid_in_current_ns(&self) -> kernel::task::Pid { + self.task.tgid_nr_ns(None) + } + + #[inline(never)] + pub(crate) fn debug_print_stats(&self, m: &SeqFile, ctx: &Context) -> = Result<()> { + seq_print!(m, "proc {}\n", self.pid_in_current_ns()); + seq_print!(m, "context {}\n", &*ctx.name); + + let inner =3D self.inner.lock(); + seq_print!(m, " threads: {}\n", inner.threads.iter().count()); + seq_print!( + m, + " requested threads: {}+{}/{}\n", + inner.requested_thread_count, + inner.started_thread_count, + inner.max_threads, + ); + if let Some(mapping) =3D &inner.mapping { + seq_print!( + m, + " free oneway space: {}\n", + mapping.alloc.free_oneway_space() + ); + seq_print!(m, " buffers: {}\n", mapping.alloc.count_buffers()= ); + } + seq_print!( + m, + " outstanding transactions: {}\n", + inner.outstanding_txns + ); + seq_print!(m, " nodes: {}\n", inner.nodes.iter().count()); + drop(inner); + + { + let mut refs =3D self.node_refs.lock(); + let (mut count, mut weak, mut strong) =3D (0, 0, 0); + for r in refs.by_handle.values_mut() { + let node_ref =3D r.node_ref(); + let (nstrong, nweak) =3D node_ref.get_count(); + count +=3D 1; + weak +=3D nweak; + strong +=3D nstrong; + } + seq_print!(m, " refs: {count} s {strong} w {weak}\n"); + } + + self.stats.debug_print(" ", m); + + Ok(()) + } + + #[inline(never)] + pub(crate) fn debug_print(&self, m: &SeqFile, ctx: &Context, print_all= : bool) -> Result<()> { + seq_print!(m, "proc {}\n", self.pid_in_current_ns()); + seq_print!(m, "context {}\n", &*ctx.name); + + let mut all_threads =3D KVec::new(); + let mut all_nodes =3D KVec::new(); + loop { + let inner =3D self.inner.lock(); + let num_threads =3D inner.threads.iter().count(); + let num_nodes =3D inner.nodes.iter().count(); + + if all_threads.capacity() < num_threads || all_nodes.capacity(= ) < num_nodes { + drop(inner); + all_threads.reserve(num_threads, GFP_KERNEL)?; + all_nodes.reserve(num_nodes, GFP_KERNEL)?; + continue; + } + + for thread in inner.threads.values() { + assert!(all_threads.len() < all_threads.capacity()); + let _ =3D all_threads.push(thread.clone(), GFP_ATOMIC); + } + + for node in inner.nodes.values() { + assert!(all_nodes.len() < all_nodes.capacity()); + let _ =3D all_nodes.push(node.clone(), GFP_ATOMIC); + } + + break; + } + + for thread in all_threads { + thread.debug_print(m, print_all)?; + } + + let mut inner =3D self.inner.lock(); + for node in all_nodes { + if print_all || node.has_oneway_transaction(&mut inner) { + node.full_debug_print(m, &mut inner)?; + } + } + drop(inner); + + if print_all { + let mut refs =3D self.node_refs.lock(); + for r in refs.by_handle.values_mut() { + let node_ref =3D r.node_ref(); + let dead =3D node_ref.node.owner.inner.lock().is_dead; + let (strong, weak) =3D node_ref.get_count(); + let debug_id =3D node_ref.node.debug_id; + + seq_print!( + m, + " ref {}: desc {} {}node {debug_id} s {strong} w {wea= k}", + r.debug_id, + r.handle, + if dead { "dead " } else { "" }, + ); + } + } + + let inner =3D self.inner.lock(); + for work in &inner.work { + work.debug_print(m, " ", " pending transaction ")?; + } + for _death in &inner.delivered_deaths { + seq_print!(m, " has delivered dead binder\n"); + } + if let Some(mapping) =3D &inner.mapping { + mapping.alloc.debug_print(m)?; + } + drop(inner); + + Ok(()) + } + + /// Attempts to fetch a work item from the process queue. + pub(crate) fn get_work(&self) -> Option> { + self.inner.lock().work.pop_front() + } + + /// Attempts to fetch a work item from the process queue. If none is a= vailable, it registers the + /// given thread as ready to receive work directly. + /// + /// This must only be called when the thread is not participating in a= transaction chain; when + /// it is, work will always be delivered directly to the thread (and n= ot through the process + /// queue). + pub(crate) fn get_work_or_register<'a>( + &'a self, + thread: &'a Arc, + ) -> GetWorkOrRegister<'a> { + let mut inner =3D self.inner.lock(); + // Try to get work from the process queue. + if let Some(work) =3D inner.work.pop_front() { + return GetWorkOrRegister::Work(work); + } + + // Register the thread as ready. + GetWorkOrRegister::Register(Registration::new(thread, &mut inner)) + } + + fn get_current_thread(self: ArcBorrow<'_, Self>) -> Result= > { + let id =3D { + let current =3D kernel::current!(); + if !core::ptr::eq(current.group_leader(), &*self.task) { + pr_err!("get_current_thread was called from the wrong proc= ess."); + return Err(EINVAL); + } + current.pid() + }; + + { + let inner =3D self.inner.lock(); + if let Some(thread) =3D inner.threads.get(&id) { + return Ok(thread.clone()); + } + } + + // Allocate a new `Thread` without holding any locks. + let reservation =3D RBTreeNodeReservation::new(GFP_KERNEL)?; + let ta: Arc =3D Thread::new(id, self.into())?; + + let mut inner =3D self.inner.lock(); + match inner.threads.entry(id) { + rbtree::Entry::Vacant(entry) =3D> { + entry.insert(ta.clone(), reservation); + Ok(ta) + } + rbtree::Entry::Occupied(_entry) =3D> { + pr_err!("Cannot create two threads with the same id."); + Err(EINVAL) + } + } + } + + pub(crate) fn push_work(&self, work: DLArc) -> Bind= erResult { + // If push_work fails, drop the work item outside the lock. + let res =3D self.inner.lock().push_work(work); + match res { + Ok(()) =3D> Ok(()), + Err((err, work)) =3D> { + drop(work); + Err(err) + } + } + } + + fn set_as_manager( + self: ArcBorrow<'_, Self>, + info: Option, + thread: &Thread, + ) -> Result { + let (ptr, cookie, flags) =3D if let Some(obj) =3D info { + ( + // SAFETY: The object type for this ioctl is implicitly `B= INDER_TYPE_BINDER`, so it + // is safe to access the `binder` field. + unsafe { obj.__bindgen_anon_1.binder }, + obj.cookie, + obj.flags, + ) + } else { + (0, 0, 0) + }; + let node_ref =3D self.get_node(ptr, cookie, flags as _, true, thre= ad)?; + let node =3D node_ref.node.clone(); + self.ctx.set_manager_node(node_ref)?; + self.inner.lock().is_manager =3D true; + + // Force the state of the node to prevent the delivery of acquire/= increfs. + let mut owner_inner =3D node.owner.inner.lock(); + node.force_has_count(&mut owner_inner); + Ok(()) + } + + fn get_node_inner( + self: ArcBorrow<'_, Self>, + ptr: u64, + cookie: u64, + flags: u32, + strong: bool, + thread: &Thread, + wrapper: Option, + ) -> Result> { + // Try to find an existing node. + { + let mut inner =3D self.inner.lock(); + if let Some(node) =3D inner.get_existing_node(ptr, cookie)? { + return Ok(inner.new_node_ref_with_thread(node, strong, thr= ead, wrapper)); + } + } + + // Allocate the node before reacquiring the lock. + let node =3D DTRWrap::arc_pin_init(Node::new(ptr, cookie, flags, s= elf.into()))?.into_arc(); + let rbnode =3D RBTreeNode::new(ptr, node.clone(), GFP_KERNEL)?; + let mut inner =3D self.inner.lock(); + if let Some(node) =3D inner.get_existing_node(ptr, cookie)? { + return Ok(inner.new_node_ref_with_thread(node, strong, thread,= wrapper)); + } + + inner.nodes.insert(rbnode); + // This can only fail if someone has already pushed the node to a = list, but we just created + // it and still hold the lock, so it can't fail right now. + let node_ref =3D inner + .new_node_ref_with_thread(node, strong, thread, wrapper) + .unwrap(); + + Ok(Ok(node_ref)) + } + + pub(crate) fn get_node( + self: ArcBorrow<'_, Self>, + ptr: u64, + cookie: u64, + flags: u32, + strong: bool, + thread: &Thread, + ) -> Result { + let mut wrapper =3D None; + for _ in 0..2 { + match self.get_node_inner(ptr, cookie, flags, strong, thread, = wrapper) { + Err(err) =3D> return Err(err), + Ok(Ok(node_ref)) =3D> return Ok(node_ref), + Ok(Err(CouldNotDeliverCriticalIncrement)) =3D> { + wrapper =3D Some(CritIncrWrapper::new()?); + } + } + } + // We only get a `CouldNotDeliverCriticalIncrement` error if `wrap= per` is `None`, so the + // loop should run at most twice. + unreachable!() + } + + pub(crate) fn insert_or_update_handle( + self: ArcBorrow<'_, Process>, + node_ref: NodeRef, + is_mananger: bool, + ) -> Result { + { + let mut refs =3D self.node_refs.lock(); + + // Do a lookup before inserting. + if let Some(handle_ref) =3D refs.by_node.get(&node_ref.node.gl= obal_id()) { + let handle =3D *handle_ref; + let info =3D refs.by_handle.get_mut(&handle).unwrap(); + info.node_ref().absorb(node_ref); + return Ok(handle); + } + } + + // Reserve memory for tree nodes. + let reserve1 =3D RBTreeNodeReservation::new(GFP_KERNEL)?; + let reserve2 =3D RBTreeNodeReservation::new(GFP_KERNEL)?; + let info =3D UniqueArc::new_uninit(GFP_KERNEL)?; + + let mut refs =3D self.node_refs.lock(); + + // Do a lookup again as node may have been inserted before the loc= k was reacquired. + if let Some(handle_ref) =3D refs.by_node.get(&node_ref.node.global= _id()) { + let handle =3D *handle_ref; + let info =3D refs.by_handle.get_mut(&handle).unwrap(); + info.node_ref().absorb(node_ref); + return Ok(handle); + } + + // Find id. + let mut target: u32 =3D if is_mananger { 0 } else { 1 }; + for handle in refs.by_handle.keys() { + if *handle > target { + break; + } + if *handle =3D=3D target { + target =3D target.checked_add(1).ok_or(ENOMEM)?; + } + } + + let gid =3D node_ref.node.global_id(); + let (info_proc, info_node) =3D { + let info_init =3D NodeRefInfo::new(node_ref, target, self.into= ()); + match info.pin_init_with(info_init) { + Ok(info) =3D> ListArc::pair_from_pin_unique(info), + // error is infallible + Err(err) =3D> match err {}, + } + }; + + // Ensure the process is still alive while we insert a new referen= ce. + // + // This releases the lock before inserting the nodes, but since `i= s_dead` is set as the + // first thing in `deferred_release`, process cleanup will not mis= s the items inserted into + // `refs` below. + if self.inner.lock().is_dead { + return Err(ESRCH); + } + + // SAFETY: `info_proc` and `info_node` reference the same node, so= we are inserting + // `info_node` into the right node's `refs` list. + unsafe { info_proc.node_ref2().node.insert_node_info(info_node) }; + + refs.by_node.insert(reserve1.into_node(gid, target)); + refs.by_handle.insert(reserve2.into_node(target, info_proc)); + Ok(target) + } + + pub(crate) fn get_transaction_node(&self, handle: u32) -> BinderResult= { + // When handle is zero, try to get the context manager. + if handle =3D=3D 0 { + Ok(self.ctx.get_manager_node(true)?) + } else { + Ok(self.get_node_from_handle(handle, true)?) + } + } + + pub(crate) fn get_node_from_handle(&self, handle: u32, strong: bool) -= > Result { + self.node_refs + .lock() + .by_handle + .get_mut(&handle) + .ok_or(ENOENT)? + .node_ref() + .clone(strong) + } + + pub(crate) fn remove_from_delivered_deaths(&self, death: &DArc) { + let mut inner =3D self.inner.lock(); + // SAFETY: By the invariant on the `delivered_links` field, this i= s the right linked list. + let removed =3D unsafe { inner.delivered_deaths.remove(death) }; + drop(inner); + drop(removed); + } + + pub(crate) fn update_ref( + self: ArcBorrow<'_, Process>, + handle: u32, + inc: bool, + strong: bool, + ) -> Result { + if inc && handle =3D=3D 0 { + if let Ok(node_ref) =3D self.ctx.get_manager_node(strong) { + if core::ptr::eq(&*self, &*node_ref.node.owner) { + return Err(EINVAL); + } + let _ =3D self.insert_or_update_handle(node_ref, true); + return Ok(()); + } + } + + // To preserve original binder behaviour, we only fail requests wh= ere the manager tries to + // increment references on itself. + let mut refs =3D self.node_refs.lock(); + if let Some(info) =3D refs.by_handle.get_mut(&handle) { + if info.node_ref().update(inc, strong) { + // Clean up death if there is one attached to this node re= ference. + if let Some(death) =3D info.death().take() { + death.set_cleared(true); + self.remove_from_delivered_deaths(&death); + } + + // Remove reference from process tables, and from the node= 's `refs` list. + + // SAFETY: We are removing the `NodeRefInfo` from the righ= t node. + unsafe { info.node_ref2().node.remove_node_info(info) }; + + let id =3D info.node_ref().node.global_id(); + refs.by_handle.remove(&handle); + refs.by_node.remove(&id); + } + } else { + // All refs are cleared in process exit, so this warning is ex= pected in that case. + if !self.inner.lock().is_dead { + pr_warn!("{}: no such ref {handle}\n", self.pid_in_current= _ns()); + } + } + Ok(()) + } + + /// Decrements the refcount of the given node, if one exists. + pub(crate) fn update_node(&self, ptr: u64, cookie: u64, strong: bool) { + let mut inner =3D self.inner.lock(); + if let Ok(Some(node)) =3D inner.get_existing_node(ptr, cookie) { + inner.update_node_refcount(&node, false, strong, 1, None); + } + } + + pub(crate) fn inc_ref_done(&self, reader: &mut UserSliceReader, strong= : bool) -> Result { + let ptr =3D reader.read::()?; + let cookie =3D reader.read::()?; + let mut inner =3D self.inner.lock(); + if let Ok(Some(node)) =3D inner.get_existing_node(ptr, cookie) { + if let Some(node) =3D node.inc_ref_done_locked(strong, &mut in= ner) { + // This only fails if the process is dead. + let _ =3D inner.push_work(node); + } + } + Ok(()) + } + + pub(crate) fn buffer_alloc( + self: &Arc, + debug_id: usize, + size: usize, + is_oneway: bool, + from_pid: i32, + ) -> BinderResult { + use kernel::page::PAGE_SIZE; + + let mut reserve_new_args =3D ReserveNewArgs { + debug_id, + size, + is_oneway, + pid: from_pid, + ..ReserveNewArgs::default() + }; + + let (new_alloc, addr) =3D loop { + let mut inner =3D self.inner.lock(); + let mapping =3D inner.mapping.as_mut().ok_or_else(BinderError:= :new_dead)?; + let alloc_request =3D match mapping.alloc.reserve_new(reserve_= new_args)? { + ReserveNew::Success(new_alloc) =3D> break (new_alloc, mapp= ing.address), + ReserveNew::NeedAlloc(request) =3D> request, + }; + drop(inner); + // We need to allocate memory and then call `reserve_new` agai= n. + reserve_new_args =3D alloc_request.make_alloc()?; + }; + + let res =3D Allocation::new( + self.clone(), + debug_id, + new_alloc.offset, + size, + addr + new_alloc.offset, + new_alloc.oneway_spam_detected, + ); + + // This allocation will be marked as in use until the `Allocation`= is used to free it. + // + // This method can't be called while holding a lock, so we release= the lock first. It's + // okay for several threads to use the method on the same index at= the same time. In that + // case, one of the calls will allocate the given page (if missing= ), and the other call + // will wait for the other call to finish allocating the page. + // + // We will not call `stop_using_range` in parallel with this on th= e same page, because the + // allocation can only be removed via the destructor of the `Alloc= ation` object that we + // currently own. + match self.pages.use_range( + new_alloc.offset / PAGE_SIZE, + (new_alloc.offset + size).div_ceil(PAGE_SIZE), + ) { + Ok(()) =3D> {} + Err(err) =3D> { + pr_warn!("use_range failure {:?}", err); + return Err(err.into()); + } + } + + Ok(NewAllocation(res)) + } + + pub(crate) fn buffer_get(self: &Arc, ptr: usize) -> Option { + let mut inner =3D self.inner.lock(); + let mapping =3D inner.mapping.as_mut()?; + let offset =3D ptr.checked_sub(mapping.address)?; + let (size, debug_id, odata) =3D mapping.alloc.reserve_existing(off= set).ok()?; + let mut alloc =3D Allocation::new(self.clone(), debug_id, offset, = size, ptr, false); + if let Some(data) =3D odata { + alloc.set_info(data); + } + Some(alloc) + } + + pub(crate) fn buffer_raw_free(&self, ptr: usize) { + let mut inner =3D self.inner.lock(); + if let Some(ref mut mapping) =3D &mut inner.mapping { + let offset =3D match ptr.checked_sub(mapping.address) { + Some(offset) =3D> offset, + None =3D> return, + }; + + let freed_range =3D match mapping.alloc.reservation_abort(offs= et) { + Ok(freed_range) =3D> freed_range, + Err(_) =3D> { + pr_warn!( + "Pointer {:x} failed to free, base =3D {:x}\n", + ptr, + mapping.address + ); + return; + } + }; + + // No more allocations in this range. Mark them as not in use. + // + // Must be done before we release the lock so that `use_range`= is not used on these + // indices until `stop_using_range` returns. + self.pages + .stop_using_range(freed_range.start_page_idx, freed_range.= end_page_idx); + } + } + + pub(crate) fn buffer_make_freeable(&self, offset: usize, mut data: Opt= ion) { + let mut inner =3D self.inner.lock(); + if let Some(ref mut mapping) =3D &mut inner.mapping { + if mapping.alloc.reservation_commit(offset, &mut data).is_err(= ) { + pr_warn!("Offset {} failed to be marked freeable\n", offse= t); + } + } + } + + fn create_mapping(&self, vma: &mm::virt::VmaNew) -> Result { + use kernel::page::PAGE_SIZE; + let size =3D usize::min(vma.end() - vma.start(), bindings::SZ_4M a= s usize); + let mapping =3D Mapping::new(vma.start(), size); + let page_count =3D self.pages.register_with_vma(vma)?; + if page_count * PAGE_SIZE !=3D size { + return Err(EINVAL); + } + + // Save range allocator for later. + self.inner.lock().mapping =3D Some(mapping); + + Ok(()) + } + + fn version(&self, data: UserSlice) -> Result { + data.writer().write(&BinderVersion::current()) + } + + pub(crate) fn register_thread(&self) -> bool { + self.inner.lock().register_thread() + } + + fn remove_thread(&self, thread: Arc) { + self.inner.lock().threads.remove(&thread.id); + thread.release(); + } + + fn set_max_threads(&self, max: u32) { + self.inner.lock().max_threads =3D max; + } + + fn set_oneway_spam_detection_enabled(&self, enabled: u32) { + self.inner.lock().oneway_spam_detection_enabled =3D enabled !=3D 0; + } + + pub(crate) fn is_oneway_spam_detection_enabled(&self) -> bool { + self.inner.lock().oneway_spam_detection_enabled + } + + fn get_node_debug_info(&self, data: UserSlice) -> Result { + let (mut reader, mut writer) =3D data.reader_writer(); + + // Read the starting point. + let ptr =3D reader.read::()?.ptr; + let mut out =3D BinderNodeDebugInfo::default(); + + { + let inner =3D self.inner.lock(); + for (node_ptr, node) in &inner.nodes { + if *node_ptr > ptr { + node.populate_debug_info(&mut out, &inner); + break; + } + } + } + + writer.write(&out) + } + + fn get_node_info_from_ref(&self, data: UserSlice) -> Result { + let (mut reader, mut writer) =3D data.reader_writer(); + let mut out =3D reader.read::()?; + + if out.strong_count !=3D 0 + || out.weak_count !=3D 0 + || out.reserved1 !=3D 0 + || out.reserved2 !=3D 0 + || out.reserved3 !=3D 0 + { + return Err(EINVAL); + } + + // Only the context manager is allowed to use this ioctl. + if !self.inner.lock().is_manager { + return Err(EPERM); + } + + { + let mut node_refs =3D self.node_refs.lock(); + let node_info =3D node_refs.by_handle.get_mut(&out.handle).ok_= or(ENOENT)?; + let node_ref =3D node_info.node_ref(); + let owner_inner =3D node_ref.node.owner.inner.lock(); + node_ref.node.populate_counts(&mut out, &owner_inner); + } + + // Write the result back. + writer.write(&out) + } + + pub(crate) fn needs_thread(&self) -> bool { + let mut inner =3D self.inner.lock(); + let ret =3D inner.requested_thread_count =3D=3D 0 + && inner.ready_threads.is_empty() + && inner.started_thread_count < inner.max_threads; + if ret { + inner.requested_thread_count +=3D 1 + } + ret + } + + pub(crate) fn request_death( + self: &Arc, + reader: &mut UserSliceReader, + thread: &Thread, + ) -> Result { + let handle: u32 =3D reader.read()?; + let cookie: u64 =3D reader.read()?; + + // Queue BR_ERROR if we can't allocate memory for the death notifi= cation. + let death =3D UniqueArc::new_uninit(GFP_KERNEL).inspect_err(|_| { + thread.push_return_work(BR_ERROR); + })?; + let mut refs =3D self.node_refs.lock(); + let Some(info) =3D refs.by_handle.get_mut(&handle) else { + pr_warn!("BC_REQUEST_DEATH_NOTIFICATION invalid ref {handle}\n= "); + return Ok(()); + }; + + // Nothing to do if there is already a death notification request = for this handle. + if info.death().is_some() { + pr_warn!("BC_REQUEST_DEATH_NOTIFICATION death notification alr= eady set\n"); + return Ok(()); + } + + let death =3D { + let death_init =3D NodeDeath::new(info.node_ref().node.clone()= , self.clone(), cookie); + match death.pin_init_with(death_init) { + Ok(death) =3D> death, + // error is infallible + Err(err) =3D> match err {}, + } + }; + + // Register the death notification. + { + let owner =3D info.node_ref2().node.owner.clone(); + let mut owner_inner =3D owner.inner.lock(); + if owner_inner.is_dead { + let death =3D Arc::from(death); + *info.death() =3D Some(death.clone()); + drop(owner_inner); + death.set_dead(); + } else { + let death =3D ListArc::from(death); + *info.death() =3D Some(death.clone_arc()); + info.node_ref().node.add_death(death, &mut owner_inner); + } + } + Ok(()) + } + + pub(crate) fn clear_death(&self, reader: &mut UserSliceReader, thread:= &Thread) -> Result { + let handle: u32 =3D reader.read()?; + let cookie: u64 =3D reader.read()?; + + let mut refs =3D self.node_refs.lock(); + let Some(info) =3D refs.by_handle.get_mut(&handle) else { + pr_warn!("BC_CLEAR_DEATH_NOTIFICATION invalid ref {handle}\n"); + return Ok(()); + }; + + let Some(death) =3D info.death().take() else { + pr_warn!("BC_CLEAR_DEATH_NOTIFICATION death notification not a= ctive\n"); + return Ok(()); + }; + if death.cookie !=3D cookie { + *info.death() =3D Some(death); + pr_warn!("BC_CLEAR_DEATH_NOTIFICATION death notification cooki= e mismatch\n"); + return Ok(()); + } + + // Update state and determine if we need to queue a work item. We = only need to do it when + // the node is not dead or if the user already completed the death= notification. + if death.set_cleared(false) { + if let Some(death) =3D ListArc::try_from_arc_or_drop(death) { + let _ =3D thread.push_work_if_looper(death); + } + } + + Ok(()) + } + + pub(crate) fn dead_binder_done(&self, cookie: u64, thread: &Thread) { + if let Some(death) =3D self.inner.lock().pull_delivered_death(cook= ie) { + death.set_notification_done(thread); + } + } + + /// Locks the spinlock and move the `nodes` rbtree out. + /// + /// This allows you to iterate through `nodes` while also allowing you= to give other parts of + /// the codebase exclusive access to `ProcessInner`. + pub(crate) fn lock_with_nodes(&self) -> WithNodes<'_> { + let mut inner =3D self.inner.lock(); + WithNodes { + nodes: take(&mut inner.nodes), + inner, + } + } + + fn deferred_flush(&self) { + let inner =3D self.inner.lock(); + for thread in inner.threads.values() { + thread.exit_looper(); + } + } + + fn deferred_release(self: Arc) { + let is_manager =3D { + let mut inner =3D self.inner.lock(); + inner.is_dead =3D true; + inner.is_frozen =3D false; + inner.sync_recv =3D false; + inner.async_recv =3D false; + inner.is_manager + }; + + if is_manager { + self.ctx.unset_manager_node(); + } + + self.ctx.deregister_process(&self); + + let binderfs_file =3D self.inner.lock().binderfs_file.take(); + drop(binderfs_file); + + // Release threads. + let threads =3D { + let mut inner =3D self.inner.lock(); + let threads =3D take(&mut inner.threads); + let ready =3D take(&mut inner.ready_threads); + drop(inner); + drop(ready); + + for thread in threads.values() { + thread.release(); + } + threads + }; + + // Release nodes. + { + while let Some(node) =3D { + let mut lock =3D self.inner.lock(); + lock.nodes.cursor_front().map(|c| c.remove_current().1) + } { + node.to_key_value().1.release(); + } + } + + // Clean up death listeners and remove nodes from external node in= fo lists. + for info in self.node_refs.lock().by_handle.values_mut() { + // SAFETY: We are removing the `NodeRefInfo` from the right no= de. + unsafe { info.node_ref2().node.remove_node_info(info) }; + + // Remove all death notifications from the nodes (that belong = to a different process). + let death =3D if let Some(existing) =3D info.death().take() { + existing + } else { + continue; + }; + death.set_cleared(false); + } + + // Clean up freeze listeners. + let freeze_listeners =3D take(&mut self.node_refs.lock().freeze_li= steners); + for listener in freeze_listeners.values() { + listener.on_process_exit(&self); + } + drop(freeze_listeners); + + // Release refs on foreign nodes. + { + let mut refs =3D self.node_refs.lock(); + let by_handle =3D take(&mut refs.by_handle); + let by_node =3D take(&mut refs.by_node); + drop(refs); + drop(by_node); + drop(by_handle); + } + + // Cancel all pending work items. + while let Some(work) =3D self.get_work() { + work.into_arc().cancel(); + } + + let delivered_deaths =3D take(&mut self.inner.lock().delivered_dea= ths); + drop(delivered_deaths); + + // Free any resources kept alive by allocated buffers. + let omapping =3D self.inner.lock().mapping.take(); + if let Some(mut mapping) =3D omapping { + let address =3D mapping.address; + mapping + .alloc + .take_for_each(|offset, size, debug_id, odata| { + let ptr =3D offset + address; + pr_warn!( + "{}: removing orphan mapping {offset}:{size}\n", + self.pid_in_current_ns() + ); + let mut alloc =3D + Allocation::new(self.clone(), debug_id, offset, si= ze, ptr, false); + if let Some(data) =3D odata { + alloc.set_info(data); + } + drop(alloc) + }); + } + + // calls to synchronize_rcu() in thread drop will happen here + drop(threads); + } + + pub(crate) fn drop_outstanding_txn(&self) { + let wake =3D { + let mut inner =3D self.inner.lock(); + if inner.outstanding_txns =3D=3D 0 { + pr_err!("outstanding_txns underflow"); + return; + } + inner.outstanding_txns -=3D 1; + inner.is_frozen && inner.outstanding_txns =3D=3D 0 + }; + + if wake { + self.freeze_wait.notify_all(); + } + } + + pub(crate) fn ioctl_freeze(&self, info: &BinderFreezeInfo) -> Result { + if info.enable =3D=3D 0 { + let msgs =3D self.prepare_freeze_messages()?; + let mut inner =3D self.inner.lock(); + inner.sync_recv =3D false; + inner.async_recv =3D false; + inner.is_frozen =3D false; + drop(inner); + msgs.send_messages(); + return Ok(()); + } + + let mut inner =3D self.inner.lock(); + inner.sync_recv =3D false; + inner.async_recv =3D false; + inner.is_frozen =3D true; + + if info.timeout_ms > 0 { + let mut jiffies =3D kernel::time::msecs_to_jiffies(info.timeou= t_ms); + while jiffies > 0 { + if inner.outstanding_txns =3D=3D 0 { + break; + } + + match self + .freeze_wait + .wait_interruptible_timeout(&mut inner, jiffies) + { + CondVarTimeoutResult::Signal { .. } =3D> { + inner.is_frozen =3D false; + return Err(ERESTARTSYS); + } + CondVarTimeoutResult::Woken { jiffies: remaining } =3D= > { + jiffies =3D remaining; + } + CondVarTimeoutResult::Timeout =3D> { + jiffies =3D 0; + } + } + } + } + + if inner.txns_pending_locked() { + inner.is_frozen =3D false; + Err(EAGAIN) + } else { + drop(inner); + match self.prepare_freeze_messages() { + Ok(batch) =3D> { + batch.send_messages(); + Ok(()) + } + Err(kernel::alloc::AllocError) =3D> { + self.inner.lock().is_frozen =3D false; + Err(ENOMEM) + } + } + } + } +} + +fn get_frozen_status(data: UserSlice) -> Result { + let (mut reader, mut writer) =3D data.reader_writer(); + + let mut info =3D reader.read::()?; + info.sync_recv =3D 0; + info.async_recv =3D 0; + let mut found =3D false; + + for ctx in crate::context::get_all_contexts()? { + ctx.for_each_proc(|proc| { + if proc.task.pid() =3D=3D info.pid as _ { + found =3D true; + let inner =3D proc.inner.lock(); + let txns_pending =3D inner.txns_pending_locked(); + info.async_recv |=3D inner.async_recv as u32; + info.sync_recv |=3D inner.sync_recv as u32; + info.sync_recv |=3D (txns_pending as u32) << 1; + } + }); + } + + if found { + writer.write(&info)?; + Ok(()) + } else { + Err(EINVAL) + } +} + +fn ioctl_freeze(reader: &mut UserSliceReader) -> Result { + let info =3D reader.read::()?; + + // Very unlikely for there to be more than 3, since a process normally= uses at most binder and + // hwbinder. + let mut procs =3D KVec::with_capacity(3, GFP_KERNEL)?; + + let ctxs =3D crate::context::get_all_contexts()?; + for ctx in ctxs { + for proc in ctx.get_procs_with_pid(info.pid as i32)? { + procs.push(proc, GFP_KERNEL)?; + } + } + + for proc in procs { + proc.ioctl_freeze(&info)?; + } + Ok(()) +} + +/// The ioctl handler. +impl Process { + /// Ioctls that are write-only from the perspective of userspace. + /// + /// The kernel will only read from the pointer that userspace provided= to us. + fn ioctl_write_only( + this: ArcBorrow<'_, Process>, + _file: &File, + cmd: u32, + reader: &mut UserSliceReader, + ) -> Result { + let thread =3D this.get_current_thread()?; + match cmd { + uapi::BINDER_SET_MAX_THREADS =3D> this.set_max_threads(reader.= read()?), + uapi::BINDER_THREAD_EXIT =3D> this.remove_thread(thread), + uapi::BINDER_SET_CONTEXT_MGR =3D> this.set_as_manager(None, &t= hread)?, + uapi::BINDER_SET_CONTEXT_MGR_EXT =3D> { + this.set_as_manager(Some(reader.read()?), &thread)? + } + uapi::BINDER_ENABLE_ONEWAY_SPAM_DETECTION =3D> { + this.set_oneway_spam_detection_enabled(reader.read()?) + } + uapi::BINDER_FREEZE =3D> ioctl_freeze(reader)?, + _ =3D> return Err(EINVAL), + } + Ok(()) + } + + /// Ioctls that are read/write from the perspective of userspace. + /// + /// The kernel will both read from and write to the pointer that users= pace provided to us. + fn ioctl_write_read( + this: ArcBorrow<'_, Process>, + file: &File, + cmd: u32, + data: UserSlice, + ) -> Result { + let thread =3D this.get_current_thread()?; + let blocking =3D (file.flags() & file::flags::O_NONBLOCK) =3D=3D 0; + match cmd { + uapi::BINDER_WRITE_READ =3D> thread.write_read(data, blocking)= ?, + uapi::BINDER_GET_NODE_DEBUG_INFO =3D> this.get_node_debug_info= (data)?, + uapi::BINDER_GET_NODE_INFO_FOR_REF =3D> this.get_node_info_fro= m_ref(data)?, + uapi::BINDER_VERSION =3D> this.version(data)?, + uapi::BINDER_GET_FROZEN_INFO =3D> get_frozen_status(data)?, + uapi::BINDER_GET_EXTENDED_ERROR =3D> thread.get_extended_error= (data)?, + _ =3D> return Err(EINVAL), + } + Ok(()) + } +} + +/// The file operations supported by `Process`. +impl Process { + pub(crate) fn open(ctx: ArcBorrow<'_, Context>, file: &File) -> Result= > { + Self::new(ctx.into(), ARef::from(file.cred())) + } + + pub(crate) fn release(this: Arc, _file: &File) { + let binderfs_file; + let should_schedule; + { + let mut inner =3D this.inner.lock(); + should_schedule =3D inner.defer_work =3D=3D 0; + inner.defer_work |=3D PROC_DEFER_RELEASE; + binderfs_file =3D inner.binderfs_file.take(); + } + + if should_schedule { + // Ignore failures to schedule to the workqueue. Those just me= an that we're already + // scheduled for execution. + let _ =3D workqueue::system().enqueue(this); + } + + drop(binderfs_file); + } + + pub(crate) fn flush(this: ArcBorrow<'_, Process>) -> Result { + let should_schedule; + { + let mut inner =3D this.inner.lock(); + should_schedule =3D inner.defer_work =3D=3D 0; + inner.defer_work |=3D PROC_DEFER_FLUSH; + } + + if should_schedule { + // Ignore failures to schedule to the workqueue. Those just me= an that we're already + // scheduled for execution. + let _ =3D workqueue::system().enqueue(Arc::from(this)); + } + Ok(()) + } + + pub(crate) fn ioctl(this: ArcBorrow<'_, Process>, file: &File, cmd: u3= 2, arg: usize) -> Result { + use kernel::ioctl::{_IOC_DIR, _IOC_SIZE}; + use kernel::uapi::{_IOC_READ, _IOC_WRITE}; + + crate::trace::trace_ioctl(cmd, arg); + + let user_slice =3D UserSlice::new(UserPtr::from_addr(arg), _IOC_SI= ZE(cmd)); + + const _IOC_READ_WRITE: u32 =3D _IOC_READ | _IOC_WRITE; + + match _IOC_DIR(cmd) { + _IOC_WRITE =3D> Self::ioctl_write_only(this, file, cmd, &mut u= ser_slice.reader()), + _IOC_READ_WRITE =3D> Self::ioctl_write_read(this, file, cmd, u= ser_slice), + _ =3D> Err(EINVAL), + } + } + + pub(crate) fn compat_ioctl( + this: ArcBorrow<'_, Process>, + file: &File, + cmd: u32, + arg: usize, + ) -> Result { + Self::ioctl(this, file, cmd, arg) + } + + pub(crate) fn mmap( + this: ArcBorrow<'_, Process>, + _file: &File, + vma: &mm::virt::VmaNew, + ) -> Result { + // We don't allow mmap to be used in a different process. + if !core::ptr::eq(kernel::current!().group_leader(), &*this.task) { + return Err(EINVAL); + } + if vma.start() =3D=3D 0 { + return Err(EINVAL); + } + + vma.try_clear_maywrite().map_err(|_| EPERM)?; + vma.set_dontcopy(); + vma.set_mixedmap(); + + // TODO: Set ops. We need to learn when the user unmaps so that we= can stop using it. + this.create_mapping(vma) + } + + pub(crate) fn poll( + this: ArcBorrow<'_, Process>, + file: &File, + table: PollTable<'_>, + ) -> Result { + let thread =3D this.get_current_thread()?; + let (from_proc, mut mask) =3D thread.poll(file, table); + if mask =3D=3D 0 && from_proc && !this.inner.lock().work.is_empty(= ) { + mask |=3D bindings::POLLIN; + } + Ok(mask) + } +} + +/// Represents that a thread has registered with the `ready_threads` list = of its process. +/// +/// The destructor of this type will unregister the thread from the list o= f ready threads. +pub(crate) struct Registration<'a> { + thread: &'a Arc, +} + +impl<'a> Registration<'a> { + fn new(thread: &'a Arc, guard: &mut Guard<'_, ProcessInner, Sp= inLockBackend>) -> Self { + assert!(core::ptr::eq(&thread.process.inner, guard.lock_ref())); + // INVARIANT: We are pushing this thread to the right `ready_threa= ds` list. + if let Ok(list_arc) =3D ListArc::try_from_arc(thread.clone()) { + guard.ready_threads.push_front(list_arc); + } else { + // It is an error to hit this branch, and it should not be rea= chable. We try to do + // something reasonable when the failure path happens. Most li= kely, the thread in + // question will sleep forever. + pr_err!("Same thread registered with `ready_threads` twice."); + } + Self { thread } + } +} + +impl Drop for Registration<'_> { + fn drop(&mut self) { + let mut inner =3D self.thread.process.inner.lock(); + // SAFETY: The thread has the invariant that we never push it to a= ny other linked list than + // the `ready_threads` list of its parent process. Therefore, the = thread is either in that + // list, or in no list. + unsafe { inner.ready_threads.remove(self.thread) }; + } +} + +pub(crate) struct WithNodes<'a> { + pub(crate) inner: Guard<'a, ProcessInner, SpinLockBackend>, + pub(crate) nodes: RBTree>, +} + +impl Drop for WithNodes<'_> { + fn drop(&mut self) { + core::mem::swap(&mut self.nodes, &mut self.inner.nodes); + if self.nodes.iter().next().is_some() { + pr_err!("nodes array was modified while using lock_with_nodes\= n"); + } + } +} + +pub(crate) enum GetWorkOrRegister<'a> { + Work(DLArc), + Register(Registration<'a>), +} diff --git a/drivers/android/binder/range_alloc/array.rs b/drivers/android/= binder/range_alloc/array.rs new file mode 100644 index 0000000000000000000000000000000000000000..07e1dec2ce630f57333f7bdb067= 0645dfc4ca0f3 --- /dev/null +++ b/drivers/android/binder/range_alloc/array.rs @@ -0,0 +1,251 @@ +// SPDX-License-Identifier: GPL-2.0 + +// Copyright (C) 2025 Google LLC. + +use kernel::{ + page::{PAGE_MASK, PAGE_SIZE}, + prelude::*, + seq_file::SeqFile, + seq_print, + task::Pid, +}; + +use crate::range_alloc::{DescriptorState, FreedRange, Range}; + +/// Keeps track of allocations in a process' mmap. +/// +/// Each process has an mmap where the data for incoming transactions will= be placed. This struct +/// keeps track of allocations made in the mmap. For each allocation, we s= tore a descriptor that +/// has metadata related to the allocation. We also keep track of availabl= e free space. +pub(super) struct ArrayRangeAllocator { + /// This stores all ranges that are allocated. Unlike the tree based a= llocator, we do *not* + /// store the free ranges. + /// + /// Sorted by offset. + pub(super) ranges: KVec>, + size: usize, + free_oneway_space: usize, +} + +struct FindEmptyRes { + /// Which index in `ranges` should we insert the new range at? + /// + /// Inserting the new range at this index keeps `ranges` sorted. + insert_at_idx: usize, + /// Which offset should we insert the new range at? + insert_at_offset: usize, +} + +impl ArrayRangeAllocator { + pub(crate) fn new(size: usize, alloc: EmptyArrayAlloc) -> Self { + Self { + ranges: alloc.ranges, + size, + free_oneway_space: size / 2, + } + } + + pub(crate) fn free_oneway_space(&self) -> usize { + self.free_oneway_space + } + + pub(crate) fn count_buffers(&self) -> usize { + self.ranges.len() + } + + pub(crate) fn total_size(&self) -> usize { + self.size + } + + pub(crate) fn is_full(&self) -> bool { + self.ranges.len() =3D=3D self.ranges.capacity() + } + + pub(crate) fn debug_print(&self, m: &SeqFile) -> Result<()> { + for range in &self.ranges { + seq_print!( + m, + " buffer {}: {} size {} pid {} oneway {}", + 0, + range.offset, + range.size, + range.state.pid(), + range.state.is_oneway(), + ); + if let DescriptorState::Reserved(_) =3D range.state { + seq_print!(m, " reserved\n"); + } else { + seq_print!(m, " allocated\n"); + } + } + Ok(()) + } + + /// Find somewhere to put a new range. + /// + /// Unlike the tree implementation, we do not bother to find the small= est gap. The idea is that + /// fragmentation isn't a big issue when we don't have many ranges. + /// + /// Returns the index that the new range should have in `self.ranges` = after insertion. + fn find_empty_range(&self, size: usize) -> Option { + let after_last_range =3D self.ranges.last().map(Range::endpoint).u= nwrap_or(0); + + if size <=3D self.total_size() - after_last_range { + // We can put the range at the end, so just do that. + Some(FindEmptyRes { + insert_at_idx: self.ranges.len(), + insert_at_offset: after_last_range, + }) + } else { + let mut end_of_prev =3D 0; + for (i, range) in self.ranges.iter().enumerate() { + // Does it fit before the i'th range? + if size <=3D range.offset - end_of_prev { + return Some(FindEmptyRes { + insert_at_idx: i, + insert_at_offset: end_of_prev, + }); + } + end_of_prev =3D range.endpoint(); + } + None + } + } + + pub(crate) fn reserve_new( + &mut self, + debug_id: usize, + size: usize, + is_oneway: bool, + pid: Pid, + ) -> Result { + // Compute new value of free_oneway_space, which is set only on su= ccess. + let new_oneway_space =3D if is_oneway { + match self.free_oneway_space.checked_sub(size) { + Some(new_oneway_space) =3D> new_oneway_space, + None =3D> return Err(ENOSPC), + } + } else { + self.free_oneway_space + }; + + let FindEmptyRes { + insert_at_idx, + insert_at_offset, + } =3D self.find_empty_range(size).ok_or(ENOSPC)?; + self.free_oneway_space =3D new_oneway_space; + + let new_range =3D Range { + offset: insert_at_offset, + size, + state: DescriptorState::new(is_oneway, debug_id, pid), + }; + // Insert the value at the given index to keep the array sorted. + self.ranges + .insert_within_capacity(insert_at_idx, new_range) + .ok() + .unwrap(); + + Ok(insert_at_offset) + } + + pub(crate) fn reservation_abort(&mut self, offset: usize) -> Result { + // This could use a binary search, but linear scans are usually fa= ster for small arrays. + let i =3D self + .ranges + .iter() + .position(|range| range.offset =3D=3D offset) + .ok_or(EINVAL)?; + let range =3D &self.ranges[i]; + + if let DescriptorState::Allocated(_) =3D range.state { + return Err(EPERM); + } + + let size =3D range.size; + let offset =3D range.offset; + + if range.state.is_oneway() { + self.free_oneway_space +=3D size; + } + + // This computes the range of pages that are no longer used by *an= y* allocated range. The + // caller will mark them as unused, which means that they can be f= reed if the system comes + // under memory pressure. + let mut freed_range =3D FreedRange::interior_pages(offset, size); + #[expect(clippy::collapsible_if)] // reads better like this + if offset % PAGE_SIZE !=3D 0 { + if i =3D=3D 0 || self.ranges[i - 1].endpoint() <=3D (offset & = PAGE_MASK) { + freed_range.start_page_idx -=3D 1; + } + } + if range.endpoint() % PAGE_SIZE !=3D 0 { + let page_after =3D (range.endpoint() & PAGE_MASK) + PAGE_SIZE; + if i + 1 =3D=3D self.ranges.len() || page_after <=3D self.rang= es[i + 1].offset { + freed_range.end_page_idx +=3D 1; + } + } + + self.ranges.remove(i)?; + Ok(freed_range) + } + + pub(crate) fn reservation_commit(&mut self, offset: usize, data: &mut = Option) -> Result { + // This could use a binary search, but linear scans are usually fa= ster for small arrays. + let range =3D self + .ranges + .iter_mut() + .find(|range| range.offset =3D=3D offset) + .ok_or(ENOENT)?; + + let DescriptorState::Reserved(reservation) =3D &range.state else { + return Err(ENOENT); + }; + + range.state =3D DescriptorState::Allocated(reservation.clone().all= ocate(data.take())); + Ok(()) + } + + pub(crate) fn reserve_existing(&mut self, offset: usize) -> Result<(us= ize, usize, Option)> { + // This could use a binary search, but linear scans are usually fa= ster for small arrays. + let range =3D self + .ranges + .iter_mut() + .find(|range| range.offset =3D=3D offset) + .ok_or(ENOENT)?; + + let DescriptorState::Allocated(allocation) =3D &mut range.state el= se { + return Err(ENOENT); + }; + + let data =3D allocation.take(); + let debug_id =3D allocation.reservation.debug_id; + range.state =3D DescriptorState::Reserved(allocation.reservation.c= lone()); + Ok((range.size, debug_id, data)) + } + + pub(crate) fn take_for_each)>(&mu= t self, callback: F) { + for range in self.ranges.iter_mut() { + if let DescriptorState::Allocated(allocation) =3D &mut range.s= tate { + callback( + range.offset, + range.size, + allocation.reservation.debug_id, + allocation.data.take(), + ); + } + } + } +} + +pub(crate) struct EmptyArrayAlloc { + ranges: KVec>, +} + +impl EmptyArrayAlloc { + pub(crate) fn try_new(capacity: usize) -> Result { + Ok(Self { + ranges: KVec::with_capacity(capacity, GFP_KERNEL)?, + }) + } +} diff --git a/drivers/android/binder/range_alloc/mod.rs b/drivers/android/bi= nder/range_alloc/mod.rs new file mode 100644 index 0000000000000000000000000000000000000000..2301e2bc1a1fcdd163a96ac5113= d0fb48a72bb90 --- /dev/null +++ b/drivers/android/binder/range_alloc/mod.rs @@ -0,0 +1,329 @@ +// SPDX-License-Identifier: GPL-2.0 + +// Copyright (C) 2025 Google LLC. + +use kernel::{page::PAGE_SIZE, prelude::*, seq_file::SeqFile, task::Pid}; + +mod tree; +use self::tree::{FromArrayAllocs, ReserveNewTreeAlloc, TreeRangeAllocator}; + +mod array; +use self::array::{ArrayRangeAllocator, EmptyArrayAlloc}; + +enum DescriptorState { + Reserved(Reservation), + Allocated(Allocation), +} + +impl DescriptorState { + fn new(is_oneway: bool, debug_id: usize, pid: Pid) -> Self { + DescriptorState::Reserved(Reservation { + debug_id, + is_oneway, + pid, + }) + } + + fn pid(&self) -> Pid { + match self { + DescriptorState::Reserved(inner) =3D> inner.pid, + DescriptorState::Allocated(inner) =3D> inner.reservation.pid, + } + } + + fn is_oneway(&self) -> bool { + match self { + DescriptorState::Reserved(inner) =3D> inner.is_oneway, + DescriptorState::Allocated(inner) =3D> inner.reservation.is_on= eway, + } + } +} + +#[derive(Clone)] +struct Reservation { + debug_id: usize, + is_oneway: bool, + pid: Pid, +} + +impl Reservation { + fn allocate(self, data: Option) -> Allocation { + Allocation { + data, + reservation: self, + } + } +} + +struct Allocation { + reservation: Reservation, + data: Option, +} + +impl Allocation { + fn deallocate(self) -> (Reservation, Option) { + (self.reservation, self.data) + } + + fn debug_id(&self) -> usize { + self.reservation.debug_id + } + + fn take(&mut self) -> Option { + self.data.take() + } +} + +/// The array implementation must switch to the tree if it wants to go bey= ond this number of +/// ranges. +const TREE_THRESHOLD: usize =3D 8; + +/// Represents a range of pages that have just become completely free. +#[derive(Copy, Clone)] +pub(crate) struct FreedRange { + pub(crate) start_page_idx: usize, + pub(crate) end_page_idx: usize, +} + +impl FreedRange { + fn interior_pages(offset: usize, size: usize) -> FreedRange { + FreedRange { + // Divide round up + start_page_idx: offset.div_ceil(PAGE_SIZE), + // Divide round down + end_page_idx: (offset + size) / PAGE_SIZE, + } + } +} + +struct Range { + offset: usize, + size: usize, + state: DescriptorState, +} + +impl Range { + fn endpoint(&self) -> usize { + self.offset + self.size + } +} + +pub(crate) struct RangeAllocator { + inner: Impl, +} + +enum Impl { + Empty(usize), + Array(ArrayRangeAllocator), + Tree(TreeRangeAllocator), +} + +impl RangeAllocator { + pub(crate) fn new(size: usize) -> Self { + Self { + inner: Impl::Empty(size), + } + } + + pub(crate) fn free_oneway_space(&self) -> usize { + match &self.inner { + Impl::Empty(size) =3D> size / 2, + Impl::Array(array) =3D> array.free_oneway_space(), + Impl::Tree(tree) =3D> tree.free_oneway_space(), + } + } + + pub(crate) fn count_buffers(&self) -> usize { + match &self.inner { + Impl::Empty(_size) =3D> 0, + Impl::Array(array) =3D> array.count_buffers(), + Impl::Tree(tree) =3D> tree.count_buffers(), + } + } + + pub(crate) fn debug_print(&self, m: &SeqFile) -> Result<()> { + match &self.inner { + Impl::Empty(_size) =3D> Ok(()), + Impl::Array(array) =3D> array.debug_print(m), + Impl::Tree(tree) =3D> tree.debug_print(m), + } + } + + /// Try to reserve a new buffer, using the provided allocation if nece= ssary. + pub(crate) fn reserve_new(&mut self, mut args: ReserveNewArgs) -> R= esult> { + match &mut self.inner { + Impl::Empty(size) =3D> { + let empty_array =3D match args.empty_array_alloc.take() { + Some(empty_array) =3D> ArrayRangeAllocator::new(*size,= empty_array), + None =3D> { + return Ok(ReserveNew::NeedAlloc(ReserveNewNeedAllo= c { + args, + need_empty_array_alloc: true, + need_new_tree_alloc: false, + need_tree_alloc: false, + })) + } + }; + + self.inner =3D Impl::Array(empty_array); + self.reserve_new(args) + } + Impl::Array(array) if array.is_full() =3D> { + let allocs =3D match args.new_tree_alloc { + Some(ref mut allocs) =3D> allocs, + None =3D> { + return Ok(ReserveNew::NeedAlloc(ReserveNewNeedAllo= c { + args, + need_empty_array_alloc: false, + need_new_tree_alloc: true, + need_tree_alloc: true, + })) + } + }; + + let new_tree =3D + TreeRangeAllocator::from_array(array.total_size(), &mu= t array.ranges, allocs); + + self.inner =3D Impl::Tree(new_tree); + self.reserve_new(args) + } + Impl::Array(array) =3D> { + let offset =3D + array.reserve_new(args.debug_id, args.size, args.is_on= eway, args.pid)?; + Ok(ReserveNew::Success(ReserveNewSuccess { + offset, + oneway_spam_detected: false, + _empty_array_alloc: args.empty_array_alloc, + _new_tree_alloc: args.new_tree_alloc, + _tree_alloc: args.tree_alloc, + })) + } + Impl::Tree(tree) =3D> { + let alloc =3D match args.tree_alloc { + Some(alloc) =3D> alloc, + None =3D> { + return Ok(ReserveNew::NeedAlloc(ReserveNewNeedAllo= c { + args, + need_empty_array_alloc: false, + need_new_tree_alloc: false, + need_tree_alloc: true, + })); + } + }; + let (offset, oneway_spam_detected) =3D + tree.reserve_new(args.debug_id, args.size, args.is_one= way, args.pid, alloc)?; + Ok(ReserveNew::Success(ReserveNewSuccess { + offset, + oneway_spam_detected, + _empty_array_alloc: args.empty_array_alloc, + _new_tree_alloc: args.new_tree_alloc, + _tree_alloc: None, + })) + } + } + } + + /// Deletes the allocations at `offset`. + pub(crate) fn reservation_abort(&mut self, offset: usize) -> Result { + match &mut self.inner { + Impl::Empty(_size) =3D> Err(EINVAL), + Impl::Array(array) =3D> array.reservation_abort(offset), + Impl::Tree(tree) =3D> { + let freed_range =3D tree.reservation_abort(offset)?; + if tree.is_empty() { + self.inner =3D Impl::Empty(tree.total_size()); + } + Ok(freed_range) + } + } + } + + /// Called when an allocation is no longer in use by the kernel. + /// + /// The value in `data` will be stored, if any. A mutable reference is= used to avoid dropping + /// the `T` when an error is returned. + pub(crate) fn reservation_commit(&mut self, offset: usize, data: &mut = Option) -> Result { + match &mut self.inner { + Impl::Empty(_size) =3D> Err(EINVAL), + Impl::Array(array) =3D> array.reservation_commit(offset, data), + Impl::Tree(tree) =3D> tree.reservation_commit(offset, data), + } + } + + /// Called when the kernel starts using an allocation. + /// + /// Returns the size of the existing entry and the data associated wit= h it. + pub(crate) fn reserve_existing(&mut self, offset: usize) -> Result<(us= ize, usize, Option)> { + match &mut self.inner { + Impl::Empty(_size) =3D> Err(EINVAL), + Impl::Array(array) =3D> array.reserve_existing(offset), + Impl::Tree(tree) =3D> tree.reserve_existing(offset), + } + } + + /// Call the provided callback at every allocated region. + /// + /// This destroys the range allocator. Used only during shutdown. + pub(crate) fn take_for_each)>(&mu= t self, callback: F) { + match &mut self.inner { + Impl::Empty(_size) =3D> {} + Impl::Array(array) =3D> array.take_for_each(callback), + Impl::Tree(tree) =3D> tree.take_for_each(callback), + } + } +} + +/// The arguments for `reserve_new`. +#[derive(Default)] +pub(crate) struct ReserveNewArgs { + pub(crate) size: usize, + pub(crate) is_oneway: bool, + pub(crate) debug_id: usize, + pub(crate) pid: Pid, + pub(crate) empty_array_alloc: Option>, + pub(crate) new_tree_alloc: Option>, + pub(crate) tree_alloc: Option>, +} + +/// The return type of `ReserveNew`. +pub(crate) enum ReserveNew { + Success(ReserveNewSuccess), + NeedAlloc(ReserveNewNeedAlloc), +} + +/// Returned by `reserve_new` when the reservation was successul. +pub(crate) struct ReserveNewSuccess { + pub(crate) offset: usize, + pub(crate) oneway_spam_detected: bool, + + // If the user supplied an allocation that we did not end up using, th= en we return it here. + // The caller will kfree it outside of the lock. + _empty_array_alloc: Option>, + _new_tree_alloc: Option>, + _tree_alloc: Option>, +} + +/// Returned by `reserve_new` to request the caller to make an allocation = before calling the method +/// again. +pub(crate) struct ReserveNewNeedAlloc { + args: ReserveNewArgs, + need_empty_array_alloc: bool, + need_new_tree_alloc: bool, + need_tree_alloc: bool, +} + +impl ReserveNewNeedAlloc { + /// Make the necessary allocations for another call to `reserve_new`. + pub(crate) fn make_alloc(mut self) -> Result> { + if self.need_empty_array_alloc && self.args.empty_array_alloc.is_n= one() { + self.args.empty_array_alloc =3D Some(EmptyArrayAlloc::try_new(= TREE_THRESHOLD)?); + } + if self.need_new_tree_alloc && self.args.new_tree_alloc.is_none() { + self.args.new_tree_alloc =3D Some(FromArrayAllocs::try_new(TRE= E_THRESHOLD)?); + } + if self.need_tree_alloc && self.args.tree_alloc.is_none() { + self.args.tree_alloc =3D Some(ReserveNewTreeAlloc::try_new()?); + } + Ok(self.args) + } +} diff --git a/drivers/android/binder/range_alloc/tree.rs b/drivers/android/b= inder/range_alloc/tree.rs new file mode 100644 index 0000000000000000000000000000000000000000..7b1a248fcb0269ca92792c38619= 73d4ea69ada1f --- /dev/null +++ b/drivers/android/binder/range_alloc/tree.rs @@ -0,0 +1,488 @@ +// SPDX-License-Identifier: GPL-2.0 + +// Copyright (C) 2025 Google LLC. + +use kernel::{ + page::PAGE_SIZE, + prelude::*, + rbtree::{RBTree, RBTreeNode, RBTreeNodeReservation}, + seq_file::SeqFile, + seq_print, + task::Pid, +}; + +use crate::range_alloc::{DescriptorState, FreedRange, Range}; + +/// Keeps track of allocations in a process' mmap. +/// +/// Each process has an mmap where the data for incoming transactions will= be placed. This struct +/// keeps track of allocations made in the mmap. For each allocation, we s= tore a descriptor that +/// has metadata related to the allocation. We also keep track of availabl= e free space. +pub(super) struct TreeRangeAllocator { + /// This collection contains descriptors for *both* ranges containing = an allocation, *and* free + /// ranges between allocations. The free ranges get merged, so there a= re never two free ranges + /// next to each other. + tree: RBTree>, + /// Contains an entry for every free range in `self.tree`. This tree s= orts the ranges by size, + /// letting us look up the smallest range whose size is at least some = lower bound. + free_tree: RBTree, + size: usize, + free_oneway_space: usize, +} + +impl TreeRangeAllocator { + pub(crate) fn from_array( + size: usize, + ranges: &mut KVec>, + alloc: &mut FromArrayAllocs, + ) -> Self { + let mut tree =3D TreeRangeAllocator { + tree: RBTree::new(), + free_tree: RBTree::new(), + size, + free_oneway_space: size / 2, + }; + + let mut free_offset =3D 0; + for range in ranges.drain_all() { + let free_size =3D range.offset - free_offset; + if free_size > 0 { + let free_node =3D alloc.free_tree.pop().unwrap(); + tree.free_tree + .insert(free_node.into_node((free_size, free_offset), = ())); + let tree_node =3D alloc.tree.pop().unwrap(); + tree.tree.insert( + tree_node.into_node(free_offset, Descriptor::new(free_= offset, free_size)), + ); + } + free_offset =3D range.endpoint(); + + if range.state.is_oneway() { + tree.free_oneway_space =3D tree.free_oneway_space.saturati= ng_sub(range.size); + } + + let free_res =3D alloc.free_tree.pop().unwrap(); + let tree_node =3D alloc.tree.pop().unwrap(); + let mut desc =3D Descriptor::new(range.offset, range.size); + desc.state =3D Some((range.state, free_res)); + tree.tree.insert(tree_node.into_node(range.offset, desc)); + } + + // After the last range, we may need a free range. + if free_offset < size { + let free_size =3D size - free_offset; + let free_node =3D alloc.free_tree.pop().unwrap(); + tree.free_tree + .insert(free_node.into_node((free_size, free_offset), ())); + let tree_node =3D alloc.tree.pop().unwrap(); + tree.tree + .insert(tree_node.into_node(free_offset, Descriptor::new(f= ree_offset, free_size))); + } + + tree + } + + pub(crate) fn is_empty(&self) -> bool { + let mut tree_iter =3D self.tree.values(); + // There's always at least one range, because index zero is either= the start of a free or + // allocated range. + let first_value =3D tree_iter.next().unwrap(); + if tree_iter.next().is_some() { + // There are never two free ranges next to each other, so if t= here is more than one + // descriptor, then at least one of them must hold an allocate= d range. + return false; + } + // There is only one descriptor. Return true if it is for a free r= ange. + first_value.state.is_none() + } + + pub(crate) fn total_size(&self) -> usize { + self.size + } + + pub(crate) fn free_oneway_space(&self) -> usize { + self.free_oneway_space + } + + pub(crate) fn count_buffers(&self) -> usize { + self.tree + .values() + .filter(|desc| desc.state.is_some()) + .count() + } + + pub(crate) fn debug_print(&self, m: &SeqFile) -> Result<()> { + for desc in self.tree.values() { + let state =3D match &desc.state { + Some(state) =3D> &state.0, + None =3D> continue, + }; + seq_print!( + m, + " buffer: {} size {} pid {}", + desc.offset, + desc.size, + state.pid(), + ); + if state.is_oneway() { + seq_print!(m, " oneway"); + } + match state { + DescriptorState::Reserved(_res) =3D> { + seq_print!(m, " reserved\n"); + } + DescriptorState::Allocated(_alloc) =3D> { + seq_print!(m, " allocated\n"); + } + } + } + Ok(()) + } + + fn find_best_match(&mut self, size: usize) -> Option<&mut Descriptor> { + let free_cursor =3D self.free_tree.cursor_lower_bound(&(size, 0))?; + let ((_, offset), ()) =3D free_cursor.current(); + self.tree.get_mut(offset) + } + + /// Try to reserve a new buffer, using the provided allocation if nece= ssary. + pub(crate) fn reserve_new( + &mut self, + debug_id: usize, + size: usize, + is_oneway: bool, + pid: Pid, + alloc: ReserveNewTreeAlloc, + ) -> Result<(usize, bool)> { + // Compute new value of free_oneway_space, which is set only on su= ccess. + let new_oneway_space =3D if is_oneway { + match self.free_oneway_space.checked_sub(size) { + Some(new_oneway_space) =3D> new_oneway_space, + None =3D> return Err(ENOSPC), + } + } else { + self.free_oneway_space + }; + + // Start detecting spammers once we have less than 20% + // of async space left (which is less than 10% of total + // buffer size). + // + // (This will short-circut, so `low_oneway_space` is + // only called when necessary.) + let oneway_spam_detected =3D + is_oneway && new_oneway_space < self.size / 10 && self.low_one= way_space(pid); + + let (found_size, found_off, tree_node, free_tree_node) =3D match s= elf.find_best_match(size) { + None =3D> { + pr_warn!("ENOSPC from range_alloc.reserve_new - size: {}",= size); + return Err(ENOSPC); + } + Some(desc) =3D> { + let found_size =3D desc.size; + let found_offset =3D desc.offset; + + // In case we need to break up the descriptor + let new_desc =3D Descriptor::new(found_offset + size, foun= d_size - size); + let (tree_node, free_tree_node, desc_node_res) =3D alloc.i= nitialize(new_desc); + + desc.state =3D Some(( + DescriptorState::new(is_oneway, debug_id, pid), + desc_node_res, + )); + desc.size =3D size; + + (found_size, found_offset, tree_node, free_tree_node) + } + }; + self.free_oneway_space =3D new_oneway_space; + self.free_tree.remove(&(found_size, found_off)); + + if found_size !=3D size { + self.tree.insert(tree_node); + self.free_tree.insert(free_tree_node); + } + + Ok((found_off, oneway_spam_detected)) + } + + pub(crate) fn reservation_abort(&mut self, offset: usize) -> Result { + let mut cursor =3D self.tree.cursor_lower_bound(&offset).ok_or_els= e(|| { + pr_warn!( + "EINVAL from range_alloc.reservation_abort - offset: {}", + offset + ); + EINVAL + })?; + + let (_, desc) =3D cursor.current_mut(); + + if desc.offset !=3D offset { + pr_warn!( + "EINVAL from range_alloc.reservation_abort - offset: {}", + offset + ); + return Err(EINVAL); + } + + let (reservation, free_node_res) =3D desc.try_change_state(|state|= match state { + Some((DescriptorState::Reserved(reservation), free_node_res)) = =3D> { + (None, Ok((reservation, free_node_res))) + } + None =3D> { + pr_warn!( + "EINVAL from range_alloc.reservation_abort - offset: {= }", + offset + ); + (None, Err(EINVAL)) + } + allocated =3D> { + pr_warn!( + "EPERM from range_alloc.reservation_abort - offset: {}= ", + offset + ); + (allocated, Err(EPERM)) + } + })?; + + let mut size =3D desc.size; + let mut offset =3D desc.offset; + let free_oneway_space_add =3D if reservation.is_oneway { size } el= se { 0 }; + + self.free_oneway_space +=3D free_oneway_space_add; + + let mut freed_range =3D FreedRange::interior_pages(offset, size); + // Compute how large the next free region needs to be to include o= ne more page in + // the newly freed range. + let add_next_page_needed =3D match (offset + size) % PAGE_SIZE { + 0 =3D> usize::MAX, + unalign =3D> PAGE_SIZE - unalign, + }; + // Compute how large the previous free region needs to be to inclu= de one more page + // in the newly freed range. + let add_prev_page_needed =3D match offset % PAGE_SIZE { + 0 =3D> usize::MAX, + unalign =3D> unalign, + }; + + // Merge next into current if next is free + let remove_next =3D match cursor.peek_next() { + Some((_, next)) if next.state.is_none() =3D> { + if next.size >=3D add_next_page_needed { + freed_range.end_page_idx +=3D 1; + } + self.free_tree.remove(&(next.size, next.offset)); + size +=3D next.size; + true + } + _ =3D> false, + }; + + if remove_next { + let (_, desc) =3D cursor.current_mut(); + desc.size =3D size; + cursor.remove_next(); + } + + // Merge current into prev if prev is free + match cursor.peek_prev_mut() { + Some((_, prev)) if prev.state.is_none() =3D> { + if prev.size >=3D add_prev_page_needed { + freed_range.start_page_idx -=3D 1; + } + // merge previous with current, remove current + self.free_tree.remove(&(prev.size, prev.offset)); + offset =3D prev.offset; + size +=3D prev.size; + prev.size =3D size; + cursor.remove_current(); + } + _ =3D> {} + }; + + self.free_tree + .insert(free_node_res.into_node((size, offset), ())); + + Ok(freed_range) + } + + pub(crate) fn reservation_commit(&mut self, offset: usize, data: &mut = Option) -> Result { + let desc =3D self.tree.get_mut(&offset).ok_or(ENOENT)?; + + desc.try_change_state(|state| match state { + Some((DescriptorState::Reserved(reservation), free_node_res)) = =3D> ( + Some(( + DescriptorState::Allocated(reservation.allocate(data.t= ake())), + free_node_res, + )), + Ok(()), + ), + other =3D> (other, Err(ENOENT)), + }) + } + + /// Takes an entry at the given offset from [`DescriptorState::Allocat= ed`] to + /// [`DescriptorState::Reserved`]. + /// + /// Returns the size of the existing entry and the data associated wit= h it. + pub(crate) fn reserve_existing(&mut self, offset: usize) -> Result<(us= ize, usize, Option)> { + let desc =3D self.tree.get_mut(&offset).ok_or_else(|| { + pr_warn!( + "ENOENT from range_alloc.reserve_existing - offset: {}", + offset + ); + ENOENT + })?; + + let (debug_id, data) =3D desc.try_change_state(|state| match state= { + Some((DescriptorState::Allocated(allocation), free_node_res)) = =3D> { + let (reservation, data) =3D allocation.deallocate(); + let debug_id =3D reservation.debug_id; + ( + Some((DescriptorState::Reserved(reservation), free_nod= e_res)), + Ok((debug_id, data)), + ) + } + other =3D> { + pr_warn!( + "ENOENT from range_alloc.reserve_existing - offset: {}= ", + offset + ); + (other, Err(ENOENT)) + } + })?; + + Ok((desc.size, debug_id, data)) + } + + /// Call the provided callback at every allocated region. + /// + /// This destroys the range allocator. Used only during shutdown. + pub(crate) fn take_for_each)>(&mu= t self, callback: F) { + for (_, desc) in self.tree.iter_mut() { + if let Some((DescriptorState::Allocated(allocation), _)) =3D &= mut desc.state { + callback( + desc.offset, + desc.size, + allocation.debug_id(), + allocation.take(), + ); + } + } + } + + /// Find the amount and size of buffers allocated by the current calle= r. + /// + /// The idea is that once we cross the threshold, whoever is responsib= le + /// for the low async space is likely to try to send another async tra= nsaction, + /// and at some point we'll catch them in the act. This is more effic= ient + /// than keeping a map per pid. + fn low_oneway_space(&self, calling_pid: Pid) -> bool { + let mut total_alloc_size =3D 0; + let mut num_buffers =3D 0; + for (_, desc) in self.tree.iter() { + if let Some((state, _)) =3D &desc.state { + if state.is_oneway() && state.pid() =3D=3D calling_pid { + total_alloc_size +=3D desc.size; + num_buffers +=3D 1; + } + } + } + + // Warn if this pid has more than 50 transactions, or more than 50= % of + // async space (which is 25% of total buffer size). Oneway spam is= only + // detected when the threshold is exceeded. + num_buffers > 50 || total_alloc_size > self.size / 4 + } +} + +type TreeDescriptorState =3D (DescriptorState, FreeNodeRes); +struct Descriptor { + size: usize, + offset: usize, + state: Option>, +} + +impl Descriptor { + fn new(offset: usize, size: usize) -> Self { + Self { + size, + offset, + state: None, + } + } + + fn try_change_state(&mut self, f: F) -> Result + where + F: FnOnce(Option>) -> (Option>, Result), + { + let (new_state, result) =3D f(self.state.take()); + self.state =3D new_state; + result + } +} + +// (Descriptor.size, Descriptor.offset) +type FreeKey =3D (usize, usize); +type FreeNodeRes =3D RBTreeNodeReservation; + +/// An allocation for use by `reserve_new`. +pub(crate) struct ReserveNewTreeAlloc { + tree_node_res: RBTreeNodeReservation>, + free_tree_node_res: FreeNodeRes, + desc_node_res: FreeNodeRes, +} + +impl ReserveNewTreeAlloc { + pub(crate) fn try_new() -> Result { + let tree_node_res =3D RBTreeNodeReservation::new(GFP_KERNEL)?; + let free_tree_node_res =3D RBTreeNodeReservation::new(GFP_KERNEL)?; + let desc_node_res =3D RBTreeNodeReservation::new(GFP_KERNEL)?; + Ok(Self { + tree_node_res, + free_tree_node_res, + desc_node_res, + }) + } + + fn initialize( + self, + desc: Descriptor, + ) -> ( + RBTreeNode>, + RBTreeNode, + FreeNodeRes, + ) { + let size =3D desc.size; + let offset =3D desc.offset; + ( + self.tree_node_res.into_node(offset, desc), + self.free_tree_node_res.into_node((size, offset), ()), + self.desc_node_res, + ) + } +} + +/// An allocation for creating a tree from an `ArrayRangeAllocator`. +pub(crate) struct FromArrayAllocs { + tree: KVec>>, + free_tree: KVec>, +} + +impl FromArrayAllocs { + pub(crate) fn try_new(len: usize) -> Result { + let num_descriptors =3D 2 * len + 1; + + let mut tree =3D KVec::with_capacity(num_descriptors, GFP_KERNEL)?; + for _ in 0..num_descriptors { + tree.push(RBTreeNodeReservation::new(GFP_KERNEL)?, GFP_KERNEL)= ?; + } + + let mut free_tree =3D KVec::with_capacity(num_descriptors, GFP_KER= NEL)?; + for _ in 0..num_descriptors { + free_tree.push(RBTreeNodeReservation::new(GFP_KERNEL)?, GFP_KE= RNEL)?; + } + + Ok(Self { tree, free_tree }) + } +} diff --git a/drivers/android/binder/rust_binder.h b/drivers/android/binder/= rust_binder.h new file mode 100644 index 0000000000000000000000000000000000000000..31806890ed1a278793ae7178f9d= 76ca4d591a954 --- /dev/null +++ b/drivers/android/binder/rust_binder.h @@ -0,0 +1,23 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2025 Google, Inc. + */ + +#ifndef _LINUX_RUST_BINDER_H +#define _LINUX_RUST_BINDER_H + +#include +#include + +/* + * These symbols are exposed by `rust_binderfs.c` and exist here so that R= ust + * Binder can call them. + */ +int init_rust_binderfs(void); + +struct dentry; +struct inode; +struct dentry *rust_binderfs_create_proc_file(struct inode *nodp, int pid); +void rust_binderfs_remove_file(struct dentry *dentry); + +#endif diff --git a/drivers/android/binder/rust_binder_events.c b/drivers/android/= binder/rust_binder_events.c new file mode 100644 index 0000000000000000000000000000000000000000..488b1470060cc43f24345e9c323= 36134f96b0da0 --- /dev/null +++ b/drivers/android/binder/rust_binder_events.c @@ -0,0 +1,59 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* rust_binder_events.c + * + * Rust Binder tracepoints. + * + * Copyright 2025 Google LLC + */ + +#include "rust_binder.h" + +const char * const binder_command_strings[] =3D { + "BC_TRANSACTION", + "BC_REPLY", + "BC_ACQUIRE_RESULT", + "BC_FREE_BUFFER", + "BC_INCREFS", + "BC_ACQUIRE", + "BC_RELEASE", + "BC_DECREFS", + "BC_INCREFS_DONE", + "BC_ACQUIRE_DONE", + "BC_ATTEMPT_ACQUIRE", + "BC_REGISTER_LOOPER", + "BC_ENTER_LOOPER", + "BC_EXIT_LOOPER", + "BC_REQUEST_DEATH_NOTIFICATION", + "BC_CLEAR_DEATH_NOTIFICATION", + "BC_DEAD_BINDER_DONE", + "BC_TRANSACTION_SG", + "BC_REPLY_SG", +}; + +const char * const binder_return_strings[] =3D { + "BR_ERROR", + "BR_OK", + "BR_TRANSACTION", + "BR_REPLY", + "BR_ACQUIRE_RESULT", + "BR_DEAD_REPLY", + "BR_TRANSACTION_COMPLETE", + "BR_INCREFS", + "BR_ACQUIRE", + "BR_RELEASE", + "BR_DECREFS", + "BR_ATTEMPT_ACQUIRE", + "BR_NOOP", + "BR_SPAWN_LOOPER", + "BR_FINISHED", + "BR_DEAD_BINDER", + "BR_CLEAR_DEATH_NOTIFICATION_DONE", + "BR_FAILED_REPLY", + "BR_FROZEN_REPLY", + "BR_ONEWAY_SPAM_SUSPECT", + "BR_TRANSACTION_PENDING_FROZEN" +}; + +#define CREATE_TRACE_POINTS +#define CREATE_RUST_TRACE_POINTS +#include "rust_binder_events.h" diff --git a/drivers/android/binder/rust_binder_events.h b/drivers/android/= binder/rust_binder_events.h new file mode 100644 index 0000000000000000000000000000000000000000..2f3efbf9dba68e6415f0f09ff5c= 8255dc5c5bc00 --- /dev/null +++ b/drivers/android/binder/rust_binder_events.h @@ -0,0 +1,36 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2025 Google, Inc. + */ + +#undef TRACE_SYSTEM +#undef TRACE_INCLUDE_FILE +#undef TRACE_INCLUDE_PATH +#define TRACE_SYSTEM rust_binder +#define TRACE_INCLUDE_FILE rust_binder_events +#define TRACE_INCLUDE_PATH ../drivers/android/binder + +#if !defined(_RUST_BINDER_TRACE_H) || defined(TRACE_HEADER_MULTI_READ) +#define _RUST_BINDER_TRACE_H + +#include + +TRACE_EVENT(rust_binder_ioctl, + TP_PROTO(unsigned int cmd, unsigned long arg), + TP_ARGS(cmd, arg), + + TP_STRUCT__entry( + __field(unsigned int, cmd) + __field(unsigned long, arg) + ), + TP_fast_assign( + __entry->cmd =3D cmd; + __entry->arg =3D arg; + ), + TP_printk("cmd=3D0x%x arg=3D0x%lx", __entry->cmd, __entry->arg) +); + +#endif /* _RUST_BINDER_TRACE_H */ + +/* This part must be outside protection */ +#include diff --git a/drivers/android/binder/rust_binder_internal.h b/drivers/androi= d/binder/rust_binder_internal.h new file mode 100644 index 0000000000000000000000000000000000000000..78288fe7964d804205105889db3= 918bd7fa51623 --- /dev/null +++ b/drivers/android/binder/rust_binder_internal.h @@ -0,0 +1,87 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* rust_binder_internal.h + * + * This file contains internal data structures used by Rust Binder. Mostly, + * these are type definitions used only by binderfs or things that Rust Bi= nder + * define and export to binderfs. + * + * It does not include things exported by binderfs to Rust Binder since th= is + * file is not included as input to bindgen. + * + * Copyright (C) 2025 Google LLC. + */ + +#ifndef _LINUX_RUST_BINDER_INTERNAL_H +#define _LINUX_RUST_BINDER_INTERNAL_H + +#define RUST_BINDERFS_SUPER_MAGIC 0x6c6f6f71 + +#include +#include +#include + +/* + * The internal data types in the Rust Binder driver are opaque to C, so w= e use + * void pointer typedefs for these types. + */ +typedef void *rust_binder_context; + +/** + * struct binder_device - information about a binder device node + * @minor: the minor number used by this device + * @ctx: the Rust Context used by this device, or null for binder-co= ntrol + * + * This is used as the private data for files directly in binderfs, but not + * files in the binder_logs subdirectory. This struct owns a refcount on `= ctx` + * and the entry for `minor` in `binderfs_minors`. For binder-control `ctx= ` is + * null. + */ +struct binder_device { + int minor; + rust_binder_context ctx; +}; + +int rust_binder_stats_show(struct seq_file *m, void *unused); +int rust_binder_state_show(struct seq_file *m, void *unused); +int rust_binder_transactions_show(struct seq_file *m, void *unused); +int rust_binder_proc_show(struct seq_file *m, void *pid); + +extern const struct file_operations rust_binder_fops; +rust_binder_context rust_binder_new_context(char *name); +void rust_binder_remove_context(rust_binder_context device); + +/** + * binderfs_mount_opts - mount options for binderfs + * @max: maximum number of allocatable binderfs binder devices + * @stats_mode: enable binder stats in binderfs. + */ +struct binderfs_mount_opts { + int max; + int stats_mode; +}; + +/** + * binderfs_info - information about a binderfs mount + * @ipc_ns: The ipc namespace the binderfs mount belongs to. + * @control_dentry: This records the dentry of this binderfs mount + * binder-control device. + * @root_uid: uid that needs to be used when a new binder device is + * created. + * @root_gid: gid that needs to be used when a new binder device is + * created. + * @mount_opts: The mount options in use. + * @device_count: The current number of allocated binder devices. + * @proc_log_dir: Pointer to the directory dentry containing process-spe= cific + * logs. + */ +struct binderfs_info { + struct ipc_namespace *ipc_ns; + struct dentry *control_dentry; + kuid_t root_uid; + kgid_t root_gid; + struct binderfs_mount_opts mount_opts; + int device_count; + struct dentry *proc_log_dir; +}; + +#endif /* _LINUX_RUST_BINDER_INTERNAL_H */ diff --git a/drivers/android/binder/rust_binder_main.rs b/drivers/android/b= inder/rust_binder_main.rs new file mode 100644 index 0000000000000000000000000000000000000000..6773b7c273ec9634057300954d6= 7b51ca9b54f6f --- /dev/null +++ b/drivers/android/binder/rust_binder_main.rs @@ -0,0 +1,627 @@ +// SPDX-License-Identifier: GPL-2.0 + +// Copyright (C) 2025 Google LLC. + +//! Binder -- the Android IPC mechanism. +#![recursion_limit =3D "256"] +#![allow( + clippy::as_underscore, + clippy::ref_as_ptr, + clippy::ptr_as_ptr, + clippy::cast_lossless +)] + +use kernel::{ + bindings::{self, seq_file}, + fs::File, + list::{ListArc, ListArcSafe, ListLinksSelfPtr, TryNewListArc}, + prelude::*, + seq_file::SeqFile, + seq_print, + sync::poll::PollTable, + sync::Arc, + task::Pid, + transmute::AsBytes, + types::ForeignOwnable, + uaccess::UserSliceWriter, +}; + +use crate::{context::Context, page_range::Shrinker, process::Process, thre= ad::Thread}; + +use core::{ + ptr::NonNull, + sync::atomic::{AtomicBool, AtomicUsize, Ordering}, +}; + +mod allocation; +mod context; +mod deferred_close; +mod defs; +mod error; +mod node; +mod page_range; +mod process; +mod range_alloc; +mod stats; +mod thread; +mod trace; +mod transaction; + +#[allow(warnings)] // generated bindgen code +mod binderfs { + use kernel::bindings::{dentry, inode}; + + extern "C" { + pub fn init_rust_binderfs() -> kernel::ffi::c_int; + } + extern "C" { + pub fn rust_binderfs_create_proc_file( + nodp: *mut inode, + pid: kernel::ffi::c_int, + ) -> *mut dentry; + } + extern "C" { + pub fn rust_binderfs_remove_file(dentry: *mut dentry); + } + pub type rust_binder_context =3D *mut kernel::ffi::c_void; + #[repr(C)] + #[derive(Copy, Clone)] + pub struct binder_device { + pub minor: kernel::ffi::c_int, + pub ctx: rust_binder_context, + } + impl Default for binder_device { + fn default() -> Self { + let mut s =3D ::core::mem::MaybeUninit::::uninit(); + unsafe { + ::core::ptr::write_bytes(s.as_mut_ptr(), 0, 1); + s.assume_init() + } + } + } +} + +module! { + type: BinderModule, + name: "rust_binder", + authors: ["Wedson Almeida Filho", "Alice Ryhl"], + description: "Android Binder", + license: "GPL", +} + +fn next_debug_id() -> usize { + static NEXT_DEBUG_ID: AtomicUsize =3D AtomicUsize::new(0); + + NEXT_DEBUG_ID.fetch_add(1, Ordering::Relaxed) +} + +/// Provides a single place to write Binder return values via the +/// supplied `UserSliceWriter`. +pub(crate) struct BinderReturnWriter<'a> { + writer: UserSliceWriter, + thread: &'a Thread, +} + +impl<'a> BinderReturnWriter<'a> { + fn new(writer: UserSliceWriter, thread: &'a Thread) -> Self { + BinderReturnWriter { writer, thread } + } + + /// Write a return code back to user space. + /// Should be a `BR_` constant from [`defs`] e.g. [`defs::BR_TRANSACTI= ON_COMPLETE`]. + fn write_code(&mut self, code: u32) -> Result { + stats::GLOBAL_STATS.inc_br(code); + self.thread.process.stats.inc_br(code); + self.writer.write(&code) + } + + /// Write something *other than* a return code to user space. + fn write_payload(&mut self, payload: &T) -> Result { + self.writer.write(payload) + } + + fn len(&self) -> usize { + self.writer.len() + } +} + +/// Specifies how a type should be delivered to the read part of a BINDER_= WRITE_READ ioctl. +/// +/// When a value is pushed to the todo list for a process or thread, it is= stored as a trait object +/// with the type `Arc`. Trait objects are a Rust featu= re that lets you +/// implement dynamic dispatch over many different types. This lets us sto= re many different types +/// in the todo list. +trait DeliverToRead: ListArcSafe + Send + Sync { + /// Performs work. Returns true if remaining work items in the queue s= hould be processed + /// immediately, or false if it should return to caller before process= ing additional work + /// items. + fn do_work( + self: DArc, + thread: &Thread, + writer: &mut BinderReturnWriter<'_>, + ) -> Result; + + /// Cancels the given work item. This is called instead of [`DeliverTo= Read::do_work`] when work + /// won't be delivered. + fn cancel(self: DArc); + + /// Should we use `wake_up_interruptible_sync` or `wake_up_interruptib= le` when scheduling this + /// work item? + /// + /// Generally only set to true for non-oneway transactions. + fn should_sync_wakeup(&self) -> bool; + + fn debug_print(&self, m: &SeqFile, prefix: &str, transaction_prefix: &= str) -> Result<()>; +} + +// Wrapper around a `DeliverToRead` with linked list links. +#[pin_data] +struct DTRWrap { + #[pin] + links: ListLinksSelfPtr>, + #[pin] + wrapped: T, +} +kernel::list::impl_list_arc_safe! { + impl{T: ListArcSafe + ?Sized} ListArcSafe<0> for DTRWrap { + tracked_by wrapped: T; + } +} +kernel::list::impl_list_item! { + impl ListItem<0> for DTRWrap { + using ListLinksSelfPtr { self.links }; + } +} + +impl core::ops::Deref for DTRWrap { + type Target =3D T; + fn deref(&self) -> &T { + &self.wrapped + } +} + +type DArc =3D kernel::sync::Arc>; +type DLArc =3D kernel::list::ListArc>; + +impl DTRWrap { + fn new(val: impl PinInit) -> impl PinInit { + pin_init!(Self { + links <- ListLinksSelfPtr::new(), + wrapped <- val, + }) + } + + fn arc_try_new(val: T) -> Result, kernel::alloc::AllocError> { + ListArc::pin_init( + try_pin_init!(Self { + links <- ListLinksSelfPtr::new(), + wrapped: val, + }), + GFP_KERNEL, + ) + .map_err(|_| kernel::alloc::AllocError) + } + + fn arc_pin_init(init: impl PinInit) -> Result, kernel::err= or::Error> { + ListArc::pin_init( + try_pin_init!(Self { + links <- ListLinksSelfPtr::new(), + wrapped <- init, + }), + GFP_KERNEL, + ) + } +} + +struct DeliverCode { + code: u32, + skip: AtomicBool, +} + +kernel::list::impl_list_arc_safe! { + impl ListArcSafe<0> for DeliverCode { untracked; } +} + +impl DeliverCode { + fn new(code: u32) -> Self { + Self { + code, + skip: AtomicBool::new(false), + } + } + + /// Disable this DeliverCode and make it do nothing. + /// + /// This is used instead of removing it from the work list, since `Lin= kedList::remove` is + /// unsafe, whereas this method is not. + fn skip(&self) { + self.skip.store(true, Ordering::Relaxed); + } +} + +impl DeliverToRead for DeliverCode { + fn do_work( + self: DArc, + _thread: &Thread, + writer: &mut BinderReturnWriter<'_>, + ) -> Result { + if !self.skip.load(Ordering::Relaxed) { + writer.write_code(self.code)?; + } + Ok(true) + } + + fn cancel(self: DArc) {} + + fn should_sync_wakeup(&self) -> bool { + false + } + + fn debug_print(&self, m: &SeqFile, prefix: &str, _tprefix: &str) -> Re= sult<()> { + seq_print!(m, "{}", prefix); + if self.skip.load(Ordering::Relaxed) { + seq_print!(m, "(skipped) "); + } + if self.code =3D=3D defs::BR_TRANSACTION_COMPLETE { + seq_print!(m, "transaction complete\n"); + } else { + seq_print!(m, "transaction error: {}\n", self.code); + } + Ok(()) + } +} + +fn ptr_align(value: usize) -> Option { + let size =3D core::mem::size_of::() - 1; + Some(value.checked_add(size)? & !size) +} + +// SAFETY: We call register in `init`. +static BINDER_SHRINKER: Shrinker =3D unsafe { Shrinker::new() }; + +struct BinderModule {} + +impl kernel::Module for BinderModule { + fn init(_module: &'static kernel::ThisModule) -> Result { + // SAFETY: The module initializer never runs twice, so we only cal= l this once. + unsafe { crate::context::CONTEXTS.init() }; + + pr_warn!("Loaded Rust Binder."); + + BINDER_SHRINKER.register(kernel::c_str!("android-binder"))?; + + // SAFETY: The module is being loaded, so we can initialize binder= fs. + unsafe { kernel::error::to_result(binderfs::init_rust_binderfs())?= }; + + Ok(Self {}) + } +} + +/// Makes the inner type Sync. +#[repr(transparent)] +pub struct AssertSync(T); +// SAFETY: Used only to insert `file_operations` into a global, which is s= afe. +unsafe impl Sync for AssertSync {} + +/// File operations that rust_binderfs.c can use. +#[no_mangle] +#[used] +pub static rust_binder_fops: AssertSync= =3D { + // SAFETY: All zeroes is safe for the `file_operations` type. + let zeroed_ops =3D unsafe { core::mem::MaybeUninit::zeroed().assume_in= it() }; + + let ops =3D kernel::bindings::file_operations { + owner: THIS_MODULE.as_ptr(), + poll: Some(rust_binder_poll), + unlocked_ioctl: Some(rust_binder_unlocked_ioctl), + compat_ioctl: Some(rust_binder_compat_ioctl), + mmap: Some(rust_binder_mmap), + open: Some(rust_binder_open), + release: Some(rust_binder_release), + flush: Some(rust_binder_flush), + ..zeroed_ops + }; + AssertSync(ops) +}; + +/// # Safety +/// Only called by binderfs. +#[no_mangle] +unsafe extern "C" fn rust_binder_new_context( + name: *const kernel::ffi::c_char, +) -> *mut kernel::ffi::c_void { + // SAFETY: The caller will always provide a valid c string here. + let name =3D unsafe { kernel::str::CStr::from_char_ptr(name) }; + match Context::new(name) { + Ok(ctx) =3D> Arc::into_foreign(ctx), + Err(_err) =3D> core::ptr::null_mut(), + } +} + +/// # Safety +/// Only called by binderfs. +#[no_mangle] +unsafe extern "C" fn rust_binder_remove_context(device: *mut kernel::ffi::= c_void) { + if !device.is_null() { + // SAFETY: The caller ensures that the `device` pointer came from = a previous call to + // `rust_binder_new_device`. + let ctx =3D unsafe { Arc::::from_foreign(device) }; + ctx.deregister(); + drop(ctx); + } +} + +/// # Safety +/// Only called by binderfs. +unsafe extern "C" fn rust_binder_open( + inode: *mut bindings::inode, + file_ptr: *mut bindings::file, +) -> kernel::ffi::c_int { + // SAFETY: The `rust_binderfs.c` file ensures that `i_private` is set = to a + // `struct binder_device`. + let device =3D unsafe { (*inode).i_private } as *const binderfs::binde= r_device; + + assert!(!device.is_null()); + + // SAFETY: The `rust_binderfs.c` file ensures that `device->ctx` holds= a binder context when + // using the rust binder fops. + let ctx =3D unsafe { Arc::::borrow((*device).ctx) }; + + // SAFETY: The caller provides a valid file pointer to a new `struct f= ile`. + let file =3D unsafe { File::from_raw_file(file_ptr) }; + let process =3D match Process::open(ctx, file) { + Ok(process) =3D> process, + Err(err) =3D> return err.to_errno(), + }; + + // SAFETY: This is an `inode` for a newly created binder file. + match unsafe { BinderfsProcFile::new(inode, process.task.pid()) } { + Ok(Some(file)) =3D> process.inner.lock().binderfs_file =3D Some(fi= le), + Ok(None) =3D> { /* pid already exists */ } + Err(err) =3D> return err.to_errno(), + } + + // SAFETY: This file is associated with Rust binder, so we own the `pr= ivate_data` field. + unsafe { (*file_ptr).private_data =3D process.into_foreign() }; + 0 +} + +/// # Safety +/// Only called by binderfs. +unsafe extern "C" fn rust_binder_release( + _inode: *mut bindings::inode, + file: *mut bindings::file, +) -> kernel::ffi::c_int { + // SAFETY: We previously set `private_data` in `rust_binder_open`. + let process =3D unsafe { Arc::::from_foreign((*file).private_= data) }; + // SAFETY: The caller ensures that the file is valid. + let file =3D unsafe { File::from_raw_file(file) }; + Process::release(process, file); + 0 +} + +/// # Safety +/// Only called by binderfs. +unsafe extern "C" fn rust_binder_compat_ioctl( + file: *mut bindings::file, + cmd: kernel::ffi::c_uint, + arg: kernel::ffi::c_ulong, +) -> kernel::ffi::c_long { + // SAFETY: We previously set `private_data` in `rust_binder_open`. + let f =3D unsafe { Arc::::borrow((*file).private_data) }; + // SAFETY: The caller ensures that the file is valid. + match Process::compat_ioctl(f, unsafe { File::from_raw_file(file) }, c= md as _, arg as _) { + Ok(()) =3D> 0, + Err(err) =3D> err.to_errno() as isize, + } +} + +/// # Safety +/// Only called by binderfs. +unsafe extern "C" fn rust_binder_unlocked_ioctl( + file: *mut bindings::file, + cmd: kernel::ffi::c_uint, + arg: kernel::ffi::c_ulong, +) -> kernel::ffi::c_long { + // SAFETY: We previously set `private_data` in `rust_binder_open`. + let f =3D unsafe { Arc::::borrow((*file).private_data) }; + // SAFETY: The caller ensures that the file is valid. + match Process::ioctl(f, unsafe { File::from_raw_file(file) }, cmd as _= , arg as _) { + Ok(()) =3D> 0, + Err(err) =3D> err.to_errno() as isize, + } +} + +/// # Safety +/// Only called by binderfs. +unsafe extern "C" fn rust_binder_mmap( + file: *mut bindings::file, + vma: *mut bindings::vm_area_struct, +) -> kernel::ffi::c_int { + // SAFETY: We previously set `private_data` in `rust_binder_open`. + let f =3D unsafe { Arc::::borrow((*file).private_data) }; + // SAFETY: The caller ensures that the vma is valid. + let area =3D unsafe { kernel::mm::virt::VmaNew::from_raw(vma) }; + // SAFETY: The caller ensures that the file is valid. + match Process::mmap(f, unsafe { File::from_raw_file(file) }, area) { + Ok(()) =3D> 0, + Err(err) =3D> err.to_errno(), + } +} + +/// # Safety +/// Only called by binderfs. +unsafe extern "C" fn rust_binder_poll( + file: *mut bindings::file, + wait: *mut bindings::poll_table_struct, +) -> bindings::__poll_t { + // SAFETY: We previously set `private_data` in `rust_binder_open`. + let f =3D unsafe { Arc::::borrow((*file).private_data) }; + // SAFETY: The caller ensures that the file is valid. + let fileref =3D unsafe { File::from_raw_file(file) }; + // SAFETY: The caller ensures that the `PollTable` is valid. + match Process::poll(f, fileref, unsafe { PollTable::from_raw(wait) }) { + Ok(v) =3D> v, + Err(_) =3D> bindings::POLLERR, + } +} + +/// # Safety +/// Only called by binderfs. +unsafe extern "C" fn rust_binder_flush( + file: *mut bindings::file, + _id: bindings::fl_owner_t, +) -> kernel::ffi::c_int { + // SAFETY: We previously set `private_data` in `rust_binder_open`. + let f =3D unsafe { Arc::::borrow((*file).private_data) }; + match Process::flush(f) { + Ok(()) =3D> 0, + Err(err) =3D> err.to_errno(), + } +} + +/// # Safety +/// Only called by binderfs. +#[no_mangle] +unsafe extern "C" fn rust_binder_stats_show( + ptr: *mut seq_file, + _: *mut kernel::ffi::c_void, +) -> kernel::ffi::c_int { + // SAFETY: The caller ensures that the pointer is valid and exclusive = for the duration in which + // this method is called. + let m =3D unsafe { SeqFile::from_raw(ptr) }; + if let Err(err) =3D rust_binder_stats_show_impl(m) { + seq_print!(m, "failed to generate state: {:?}\n", err); + } + 0 +} + +/// # Safety +/// Only called by binderfs. +#[no_mangle] +unsafe extern "C" fn rust_binder_state_show( + ptr: *mut seq_file, + _: *mut kernel::ffi::c_void, +) -> kernel::ffi::c_int { + // SAFETY: The caller ensures that the pointer is valid and exclusive = for the duration in which + // this method is called. + let m =3D unsafe { SeqFile::from_raw(ptr) }; + if let Err(err) =3D rust_binder_state_show_impl(m) { + seq_print!(m, "failed to generate state: {:?}\n", err); + } + 0 +} + +/// # Safety +/// Only called by binderfs. +#[no_mangle] +unsafe extern "C" fn rust_binder_proc_show( + ptr: *mut seq_file, + _: *mut kernel::ffi::c_void, +) -> kernel::ffi::c_int { + // SAFETY: Accessing the private field of `seq_file` is okay. + let pid =3D (unsafe { (*ptr).private }) as usize as Pid; + // SAFETY: The caller ensures that the pointer is valid and exclusive = for the duration in which + // this method is called. + let m =3D unsafe { SeqFile::from_raw(ptr) }; + if let Err(err) =3D rust_binder_proc_show_impl(m, pid) { + seq_print!(m, "failed to generate state: {:?}\n", err); + } + 0 +} + +/// # Safety +/// Only called by binderfs. +#[no_mangle] +unsafe extern "C" fn rust_binder_transactions_show( + ptr: *mut seq_file, + _: *mut kernel::ffi::c_void, +) -> kernel::ffi::c_int { + // SAFETY: The caller ensures that the pointer is valid and exclusive = for the duration in which + // this method is called. + let m =3D unsafe { SeqFile::from_raw(ptr) }; + if let Err(err) =3D rust_binder_transactions_show_impl(m) { + seq_print!(m, "failed to generate state: {:?}\n", err); + } + 0 +} + +fn rust_binder_transactions_show_impl(m: &SeqFile) -> Result<()> { + seq_print!(m, "binder transactions:\n"); + let contexts =3D context::get_all_contexts()?; + for ctx in contexts { + let procs =3D ctx.get_all_procs()?; + for proc in procs { + proc.debug_print(m, &ctx, false)?; + seq_print!(m, "\n"); + } + } + Ok(()) +} + +fn rust_binder_stats_show_impl(m: &SeqFile) -> Result<()> { + seq_print!(m, "binder stats:\n"); + stats::GLOBAL_STATS.debug_print("", m); + let contexts =3D context::get_all_contexts()?; + for ctx in contexts { + let procs =3D ctx.get_all_procs()?; + for proc in procs { + proc.debug_print_stats(m, &ctx)?; + seq_print!(m, "\n"); + } + } + Ok(()) +} + +fn rust_binder_state_show_impl(m: &SeqFile) -> Result<()> { + seq_print!(m, "binder state:\n"); + let contexts =3D context::get_all_contexts()?; + for ctx in contexts { + let procs =3D ctx.get_all_procs()?; + for proc in procs { + proc.debug_print(m, &ctx, true)?; + seq_print!(m, "\n"); + } + } + Ok(()) +} + +fn rust_binder_proc_show_impl(m: &SeqFile, pid: Pid) -> Result<()> { + seq_print!(m, "binder proc state:\n"); + let contexts =3D context::get_all_contexts()?; + for ctx in contexts { + let procs =3D ctx.get_procs_with_pid(pid)?; + for proc in procs { + proc.debug_print(m, &ctx, true)?; + seq_print!(m, "\n"); + } + } + Ok(()) +} + +struct BinderfsProcFile(NonNull); + +// SAFETY: Safe to drop any thread. +unsafe impl Send for BinderfsProcFile {} + +impl BinderfsProcFile { + /// # Safety + /// + /// Takes an inode from a newly created binder file. + unsafe fn new(nodp: *mut bindings::inode, pid: i32) -> Result> { + // SAFETY: The caller passes an `inode` for a newly created binder= file. + let dentry =3D unsafe { binderfs::rust_binderfs_create_proc_file(n= odp, pid) }; + match kernel::error::from_err_ptr(dentry) { + Ok(dentry) =3D> Ok(NonNull::new(dentry).map(Self)), + Err(err) if err =3D=3D EEXIST =3D> Ok(None), + Err(err) =3D> Err(err), + } + } +} + +impl Drop for BinderfsProcFile { + fn drop(&mut self) { + // SAFETY: This is a dentry from `rust_binderfs_remove_file` that = has not been deleted yet. + unsafe { binderfs::rust_binderfs_remove_file(self.0.as_ptr()) }; + } +} diff --git a/drivers/android/binder/rust_binderfs.c b/drivers/android/binde= r/rust_binderfs.c new file mode 100644 index 0000000000000000000000000000000000000000..6b497146b698b3d031f8fe7d326= 4f3fadbed8722 --- /dev/null +++ b/drivers/android/binder/rust_binderfs.c @@ -0,0 +1,850 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "rust_binder.h" +#include "rust_binder_internal.h" + +#define FIRST_INODE 1 +#define SECOND_INODE 2 +#define INODE_OFFSET 3 +#define BINDERFS_MAX_MINOR (1U << MINORBITS) +/* Ensure that the initial ipc namespace always has devices available. */ +#define BINDERFS_MAX_MINOR_CAPPED (BINDERFS_MAX_MINOR - 4) + +DEFINE_SHOW_ATTRIBUTE(rust_binder_stats); +DEFINE_SHOW_ATTRIBUTE(rust_binder_state); +DEFINE_SHOW_ATTRIBUTE(rust_binder_transactions); +DEFINE_SHOW_ATTRIBUTE(rust_binder_proc); + +char *rust_binder_devices_param =3D CONFIG_ANDROID_BINDER_DEVICES; +module_param_named(rust_devices, rust_binder_devices_param, charp, 0444); + +static dev_t binderfs_dev; +static DEFINE_MUTEX(binderfs_minors_mutex); +static DEFINE_IDA(binderfs_minors); + +enum binderfs_param { + Opt_max, + Opt_stats_mode, +}; + +enum binderfs_stats_mode { + binderfs_stats_mode_unset, + binderfs_stats_mode_global, +}; + +struct binder_features { + bool oneway_spam_detection; + bool extended_error; + bool freeze_notification; +}; + +static const struct constant_table binderfs_param_stats[] =3D { + { "global", binderfs_stats_mode_global }, + {} +}; + +static const struct fs_parameter_spec binderfs_fs_parameters[] =3D { + fsparam_u32("max", Opt_max), + fsparam_enum("stats", Opt_stats_mode, binderfs_param_stats), + {} +}; + +static struct binder_features binder_features =3D { + .oneway_spam_detection =3D true, + .extended_error =3D true, + .freeze_notification =3D true, +}; + +static inline struct binderfs_info *BINDERFS_SB(const struct super_block *= sb) +{ + return sb->s_fs_info; +} + +/** + * binderfs_binder_device_create - allocate inode from super block of a + * binderfs mount + * @ref_inode: inode from wich the super block will be taken + * @userp: buffer to copy information about new device for userspace to + * @req: struct binderfs_device as copied from userspace + * + * This function allocates a new binder_device and reserves a new minor + * number for it. + * Minor numbers are limited and tracked globally in binderfs_minors. The + * function will stash a struct binder_device for the specific binder + * device in i_private of the inode. + * It will go on to allocate a new inode from the super block of the + * filesystem mount, stash a struct binder_device in its i_private field + * and attach a dentry to that inode. + * + * Return: 0 on success, negative errno on failure + */ +static int binderfs_binder_device_create(struct inode *ref_inode, + struct binderfs_device __user *userp, + struct binderfs_device *req) +{ + int minor, ret; + struct dentry *dentry, *root; + struct binder_device *device =3D NULL; + rust_binder_context ctx =3D NULL; + struct inode *inode =3D NULL; + struct super_block *sb =3D ref_inode->i_sb; + struct binderfs_info *info =3D sb->s_fs_info; +#if defined(CONFIG_IPC_NS) + bool use_reserve =3D (info->ipc_ns =3D=3D &init_ipc_ns); +#else + bool use_reserve =3D true; +#endif + + /* Reserve new minor number for the new device. */ + mutex_lock(&binderfs_minors_mutex); + if (++info->device_count <=3D info->mount_opts.max) + minor =3D ida_alloc_max(&binderfs_minors, + use_reserve ? BINDERFS_MAX_MINOR : + BINDERFS_MAX_MINOR_CAPPED, + GFP_KERNEL); + else + minor =3D -ENOSPC; + if (minor < 0) { + --info->device_count; + mutex_unlock(&binderfs_minors_mutex); + return minor; + } + mutex_unlock(&binderfs_minors_mutex); + + ret =3D -ENOMEM; + device =3D kzalloc(sizeof(*device), GFP_KERNEL); + if (!device) + goto err; + + req->name[BINDERFS_MAX_NAME] =3D '\0'; /* NUL-terminate */ + + ctx =3D rust_binder_new_context(req->name); + if (!ctx) + goto err; + + inode =3D new_inode(sb); + if (!inode) + goto err; + + inode->i_ino =3D minor + INODE_OFFSET; + simple_inode_init_ts(inode); + init_special_inode(inode, S_IFCHR | 0600, + MKDEV(MAJOR(binderfs_dev), minor)); + inode->i_fop =3D &rust_binder_fops; + inode->i_uid =3D info->root_uid; + inode->i_gid =3D info->root_gid; + + req->major =3D MAJOR(binderfs_dev); + req->minor =3D minor; + device->ctx =3D ctx; + device->minor =3D minor; + + if (userp && copy_to_user(userp, req, sizeof(*req))) { + ret =3D -EFAULT; + goto err; + } + + root =3D sb->s_root; + inode_lock(d_inode(root)); + + /* look it up */ + dentry =3D lookup_noperm(&QSTR(req->name), root); + if (IS_ERR(dentry)) { + inode_unlock(d_inode(root)); + ret =3D PTR_ERR(dentry); + goto err; + } + + if (d_really_is_positive(dentry)) { + /* already exists */ + dput(dentry); + inode_unlock(d_inode(root)); + ret =3D -EEXIST; + goto err; + } + + inode->i_private =3D device; + d_instantiate(dentry, inode); + fsnotify_create(root->d_inode, dentry); + inode_unlock(d_inode(root)); + + return 0; + +err: + kfree(device); + rust_binder_remove_context(ctx); + mutex_lock(&binderfs_minors_mutex); + --info->device_count; + ida_free(&binderfs_minors, minor); + mutex_unlock(&binderfs_minors_mutex); + iput(inode); + + return ret; +} + +/** + * binder_ctl_ioctl - handle binder device node allocation requests + * + * The request handler for the binder-control device. All requests operate= on + * the binderfs mount the binder-control device resides in: + * - BINDER_CTL_ADD + * Allocate a new binder device. + * + * Return: %0 on success, negative errno on failure. + */ +static long binder_ctl_ioctl(struct file *file, unsigned int cmd, + unsigned long arg) +{ + int ret =3D -EINVAL; + struct inode *inode =3D file_inode(file); + struct binderfs_device __user *device =3D (struct binderfs_device __user = *)arg; + struct binderfs_device device_req; + + switch (cmd) { + case BINDER_CTL_ADD: + ret =3D copy_from_user(&device_req, device, sizeof(device_req)); + if (ret) { + ret =3D -EFAULT; + break; + } + + ret =3D binderfs_binder_device_create(inode, device, &device_req); + break; + default: + break; + } + + return ret; +} + +static void binderfs_evict_inode(struct inode *inode) +{ + struct binder_device *device =3D inode->i_private; + struct binderfs_info *info =3D BINDERFS_SB(inode->i_sb); + + clear_inode(inode); + + if (!S_ISCHR(inode->i_mode) || !device) + return; + + mutex_lock(&binderfs_minors_mutex); + --info->device_count; + ida_free(&binderfs_minors, device->minor); + mutex_unlock(&binderfs_minors_mutex); + + /* ctx is null for binder-control, but this function ignores null pointer= s */ + rust_binder_remove_context(device->ctx); + + kfree(device); +} + +static int binderfs_fs_context_parse_param(struct fs_context *fc, + struct fs_parameter *param) +{ + int opt; + struct binderfs_mount_opts *ctx =3D fc->fs_private; + struct fs_parse_result result; + + opt =3D fs_parse(fc, binderfs_fs_parameters, param, &result); + if (opt < 0) + return opt; + + switch (opt) { + case Opt_max: + if (result.uint_32 > BINDERFS_MAX_MINOR) + return invalfc(fc, "Bad value for '%s'", param->key); + + ctx->max =3D result.uint_32; + break; + case Opt_stats_mode: + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + ctx->stats_mode =3D result.uint_32; + break; + default: + return invalfc(fc, "Unsupported parameter '%s'", param->key); + } + + return 0; +} + +static int binderfs_fs_context_reconfigure(struct fs_context *fc) +{ + struct binderfs_mount_opts *ctx =3D fc->fs_private; + struct binderfs_info *info =3D BINDERFS_SB(fc->root->d_sb); + + if (info->mount_opts.stats_mode !=3D ctx->stats_mode) + return invalfc(fc, "Binderfs stats mode cannot be changed during a remou= nt"); + + info->mount_opts.stats_mode =3D ctx->stats_mode; + info->mount_opts.max =3D ctx->max; + return 0; +} + +static int binderfs_show_options(struct seq_file *seq, struct dentry *root) +{ + struct binderfs_info *info =3D BINDERFS_SB(root->d_sb); + + if (info->mount_opts.max <=3D BINDERFS_MAX_MINOR) + seq_printf(seq, ",max=3D%d", info->mount_opts.max); + + switch (info->mount_opts.stats_mode) { + case binderfs_stats_mode_unset: + break; + case binderfs_stats_mode_global: + seq_puts(seq, ",stats=3Dglobal"); + break; + } + + return 0; +} + +static const struct super_operations binderfs_super_ops =3D { + .evict_inode =3D binderfs_evict_inode, + .show_options =3D binderfs_show_options, + .statfs =3D simple_statfs, +}; + +static inline bool is_binderfs_control_device(const struct dentry *dentry) +{ + struct binderfs_info *info =3D dentry->d_sb->s_fs_info; + + return info->control_dentry =3D=3D dentry; +} + +static int binderfs_rename(struct mnt_idmap *idmap, + struct inode *old_dir, struct dentry *old_dentry, + struct inode *new_dir, struct dentry *new_dentry, + unsigned int flags) +{ + if (is_binderfs_control_device(old_dentry) || + is_binderfs_control_device(new_dentry)) + return -EPERM; + + return simple_rename(idmap, old_dir, old_dentry, new_dir, + new_dentry, flags); +} + +static int binderfs_unlink(struct inode *dir, struct dentry *dentry) +{ + if (is_binderfs_control_device(dentry)) + return -EPERM; + + return simple_unlink(dir, dentry); +} + +static const struct file_operations binder_ctl_fops =3D { + .owner =3D THIS_MODULE, + .open =3D nonseekable_open, + .unlocked_ioctl =3D binder_ctl_ioctl, + .compat_ioctl =3D binder_ctl_ioctl, + .llseek =3D noop_llseek, +}; + +/** + * binderfs_binder_ctl_create - create a new binder-control device + * @sb: super block of the binderfs mount + * + * This function creates a new binder-control device node in the binderfs = mount + * referred to by @sb. + * + * Return: 0 on success, negative errno on failure + */ +static int binderfs_binder_ctl_create(struct super_block *sb) +{ + int minor, ret; + struct dentry *dentry; + struct binder_device *device; + struct inode *inode =3D NULL; + struct dentry *root =3D sb->s_root; + struct binderfs_info *info =3D sb->s_fs_info; +#if defined(CONFIG_IPC_NS) + bool use_reserve =3D (info->ipc_ns =3D=3D &init_ipc_ns); +#else + bool use_reserve =3D true; +#endif + + device =3D kzalloc(sizeof(*device), GFP_KERNEL); + if (!device) + return -ENOMEM; + + /* If we have already created a binder-control node, return. */ + if (info->control_dentry) { + ret =3D 0; + goto out; + } + + ret =3D -ENOMEM; + inode =3D new_inode(sb); + if (!inode) + goto out; + + /* Reserve a new minor number for the new device. */ + mutex_lock(&binderfs_minors_mutex); + minor =3D ida_alloc_max(&binderfs_minors, + use_reserve ? BINDERFS_MAX_MINOR : + BINDERFS_MAX_MINOR_CAPPED, + GFP_KERNEL); + mutex_unlock(&binderfs_minors_mutex); + if (minor < 0) { + ret =3D minor; + goto out; + } + + inode->i_ino =3D SECOND_INODE; + simple_inode_init_ts(inode); + init_special_inode(inode, S_IFCHR | 0600, + MKDEV(MAJOR(binderfs_dev), minor)); + inode->i_fop =3D &binder_ctl_fops; + inode->i_uid =3D info->root_uid; + inode->i_gid =3D info->root_gid; + + device->minor =3D minor; + device->ctx =3D NULL; + + dentry =3D d_alloc_name(root, "binder-control"); + if (!dentry) + goto out; + + inode->i_private =3D device; + info->control_dentry =3D dentry; + d_add(dentry, inode); + + return 0; + +out: + kfree(device); + iput(inode); + + return ret; +} + +static const struct inode_operations binderfs_dir_inode_operations =3D { + .lookup =3D simple_lookup, + .rename =3D binderfs_rename, + .unlink =3D binderfs_unlink, +}; + +static struct inode *binderfs_make_inode(struct super_block *sb, int mode) +{ + struct inode *ret; + + ret =3D new_inode(sb); + if (ret) { + ret->i_ino =3D iunique(sb, BINDERFS_MAX_MINOR + INODE_OFFSET); + ret->i_mode =3D mode; + simple_inode_init_ts(ret); + } + return ret; +} + +static struct dentry *binderfs_create_dentry(struct dentry *parent, + const char *name) +{ + struct dentry *dentry; + + dentry =3D lookup_noperm(&QSTR(name), parent); + if (IS_ERR(dentry)) + return dentry; + + /* Return error if the file/dir already exists. */ + if (d_really_is_positive(dentry)) { + dput(dentry); + return ERR_PTR(-EEXIST); + } + + return dentry; +} + +void rust_binderfs_remove_file(struct dentry *dentry) +{ + struct inode *parent_inode; + + parent_inode =3D d_inode(dentry->d_parent); + inode_lock(parent_inode); + if (simple_positive(dentry)) { + dget(dentry); + simple_unlink(parent_inode, dentry); + d_delete(dentry); + dput(dentry); + } + inode_unlock(parent_inode); +} + +static struct dentry *rust_binderfs_create_file(struct dentry *parent, con= st char *name, + const struct file_operations *fops, + void *data) +{ + struct dentry *dentry; + struct inode *new_inode, *parent_inode; + struct super_block *sb; + + parent_inode =3D d_inode(parent); + inode_lock(parent_inode); + + dentry =3D binderfs_create_dentry(parent, name); + if (IS_ERR(dentry)) + goto out; + + sb =3D parent_inode->i_sb; + new_inode =3D binderfs_make_inode(sb, S_IFREG | 0444); + if (!new_inode) { + dput(dentry); + dentry =3D ERR_PTR(-ENOMEM); + goto out; + } + + new_inode->i_fop =3D fops; + new_inode->i_private =3D data; + d_instantiate(dentry, new_inode); + fsnotify_create(parent_inode, dentry); + +out: + inode_unlock(parent_inode); + return dentry; +} + +struct dentry *rust_binderfs_create_proc_file(struct inode *nodp, int pid) +{ + struct binderfs_info *info =3D nodp->i_sb->s_fs_info; + struct dentry *dir =3D info->proc_log_dir; + char strbuf[20 + 1]; + void *data =3D (void *)(unsigned long) pid; + + if (!dir) + return NULL; + + snprintf(strbuf, sizeof(strbuf), "%u", pid); + return rust_binderfs_create_file(dir, strbuf, &rust_binder_proc_fops, dat= a); +} + +static struct dentry *binderfs_create_dir(struct dentry *parent, + const char *name) +{ + struct dentry *dentry; + struct inode *new_inode, *parent_inode; + struct super_block *sb; + + parent_inode =3D d_inode(parent); + inode_lock(parent_inode); + + dentry =3D binderfs_create_dentry(parent, name); + if (IS_ERR(dentry)) + goto out; + + sb =3D parent_inode->i_sb; + new_inode =3D binderfs_make_inode(sb, S_IFDIR | 0755); + if (!new_inode) { + dput(dentry); + dentry =3D ERR_PTR(-ENOMEM); + goto out; + } + + new_inode->i_fop =3D &simple_dir_operations; + new_inode->i_op =3D &simple_dir_inode_operations; + + set_nlink(new_inode, 2); + d_instantiate(dentry, new_inode); + inc_nlink(parent_inode); + fsnotify_mkdir(parent_inode, dentry); + +out: + inode_unlock(parent_inode); + return dentry; +} + +static int binder_features_show(struct seq_file *m, void *unused) +{ + bool *feature =3D m->private; + + seq_printf(m, "%d\n", *feature); + + return 0; +} +DEFINE_SHOW_ATTRIBUTE(binder_features); + +static int init_binder_features(struct super_block *sb) +{ + struct dentry *dentry, *dir; + + dir =3D binderfs_create_dir(sb->s_root, "features"); + if (IS_ERR(dir)) + return PTR_ERR(dir); + + dentry =3D rust_binderfs_create_file(dir, "oneway_spam_detection", + &binder_features_fops, + &binder_features.oneway_spam_detection); + if (IS_ERR(dentry)) + return PTR_ERR(dentry); + + dentry =3D rust_binderfs_create_file(dir, "extended_error", + &binder_features_fops, + &binder_features.extended_error); + if (IS_ERR(dentry)) + return PTR_ERR(dentry); + + dentry =3D rust_binderfs_create_file(dir, "freeze_notification", + &binder_features_fops, + &binder_features.freeze_notification); + if (IS_ERR(dentry)) + return PTR_ERR(dentry); + + return 0; +} + +static int init_binder_logs(struct super_block *sb) +{ + struct dentry *binder_logs_root_dir, *dentry, *proc_log_dir; + struct binderfs_info *info; + int ret =3D 0; + + binder_logs_root_dir =3D binderfs_create_dir(sb->s_root, + "binder_logs"); + if (IS_ERR(binder_logs_root_dir)) { + ret =3D PTR_ERR(binder_logs_root_dir); + goto out; + } + + dentry =3D rust_binderfs_create_file(binder_logs_root_dir, "stats", + &rust_binder_stats_fops, NULL); + if (IS_ERR(dentry)) { + ret =3D PTR_ERR(dentry); + goto out; + } + + dentry =3D rust_binderfs_create_file(binder_logs_root_dir, "state", + &rust_binder_state_fops, NULL); + if (IS_ERR(dentry)) { + ret =3D PTR_ERR(dentry); + goto out; + } + + dentry =3D rust_binderfs_create_file(binder_logs_root_dir, "transactions", + &rust_binder_transactions_fops, NULL); + if (IS_ERR(dentry)) { + ret =3D PTR_ERR(dentry); + goto out; + } + + proc_log_dir =3D binderfs_create_dir(binder_logs_root_dir, "proc"); + if (IS_ERR(proc_log_dir)) { + ret =3D PTR_ERR(proc_log_dir); + goto out; + } + info =3D sb->s_fs_info; + info->proc_log_dir =3D proc_log_dir; + +out: + return ret; +} + +static int binderfs_fill_super(struct super_block *sb, struct fs_context *= fc) +{ + int ret; + struct binderfs_info *info; + struct binderfs_mount_opts *ctx =3D fc->fs_private; + struct inode *inode =3D NULL; + struct binderfs_device device_info =3D {}; + const char *name; + size_t len; + + sb->s_blocksize =3D PAGE_SIZE; + sb->s_blocksize_bits =3D PAGE_SHIFT; + + /* + * The binderfs filesystem can be mounted by userns root in a + * non-initial userns. By default such mounts have the SB_I_NODEV flag + * set in s_iflags to prevent security issues where userns root can + * just create random device nodes via mknod() since it owns the + * filesystem mount. But binderfs does not allow to create any files + * including devices nodes. The only way to create binder devices nodes + * is through the binder-control device which userns root is explicitly + * allowed to do. So removing the SB_I_NODEV flag from s_iflags is both + * necessary and safe. + */ + sb->s_iflags &=3D ~SB_I_NODEV; + sb->s_iflags |=3D SB_I_NOEXEC; + sb->s_magic =3D RUST_BINDERFS_SUPER_MAGIC; + sb->s_op =3D &binderfs_super_ops; + sb->s_time_gran =3D 1; + + sb->s_fs_info =3D kzalloc(sizeof(struct binderfs_info), GFP_KERNEL); + if (!sb->s_fs_info) + return -ENOMEM; + info =3D sb->s_fs_info; + + info->ipc_ns =3D get_ipc_ns(current->nsproxy->ipc_ns); + + info->root_gid =3D make_kgid(sb->s_user_ns, 0); + if (!gid_valid(info->root_gid)) + info->root_gid =3D GLOBAL_ROOT_GID; + info->root_uid =3D make_kuid(sb->s_user_ns, 0); + if (!uid_valid(info->root_uid)) + info->root_uid =3D GLOBAL_ROOT_UID; + info->mount_opts.max =3D ctx->max; + info->mount_opts.stats_mode =3D ctx->stats_mode; + + inode =3D new_inode(sb); + if (!inode) + return -ENOMEM; + + inode->i_ino =3D FIRST_INODE; + inode->i_fop =3D &simple_dir_operations; + inode->i_mode =3D S_IFDIR | 0755; + simple_inode_init_ts(inode); + inode->i_op =3D &binderfs_dir_inode_operations; + set_nlink(inode, 2); + + sb->s_root =3D d_make_root(inode); + if (!sb->s_root) + return -ENOMEM; + + ret =3D binderfs_binder_ctl_create(sb); + if (ret) + return ret; + + name =3D rust_binder_devices_param; + for (len =3D strcspn(name, ","); len > 0; len =3D strcspn(name, ",")) { + strscpy(device_info.name, name, len + 1); + ret =3D binderfs_binder_device_create(inode, NULL, &device_info); + if (ret) + return ret; + name +=3D len; + if (*name =3D=3D ',') + name++; + } + + ret =3D init_binder_features(sb); + if (ret) + return ret; + + if (info->mount_opts.stats_mode =3D=3D binderfs_stats_mode_global) + return init_binder_logs(sb); + + return 0; +} + +static int binderfs_fs_context_get_tree(struct fs_context *fc) +{ + return get_tree_nodev(fc, binderfs_fill_super); +} + +static void binderfs_fs_context_free(struct fs_context *fc) +{ + struct binderfs_mount_opts *ctx =3D fc->fs_private; + + kfree(ctx); +} + +static const struct fs_context_operations binderfs_fs_context_ops =3D { + .free =3D binderfs_fs_context_free, + .get_tree =3D binderfs_fs_context_get_tree, + .parse_param =3D binderfs_fs_context_parse_param, + .reconfigure =3D binderfs_fs_context_reconfigure, +}; + +static int binderfs_init_fs_context(struct fs_context *fc) +{ + struct binderfs_mount_opts *ctx; + + ctx =3D kzalloc(sizeof(struct binderfs_mount_opts), GFP_KERNEL); + if (!ctx) + return -ENOMEM; + + ctx->max =3D BINDERFS_MAX_MINOR; + ctx->stats_mode =3D binderfs_stats_mode_unset; + + fc->fs_private =3D ctx; + fc->ops =3D &binderfs_fs_context_ops; + + return 0; +} + +static void binderfs_kill_super(struct super_block *sb) +{ + struct binderfs_info *info =3D sb->s_fs_info; + + /* + * During inode eviction struct binderfs_info is needed. + * So first wipe the super_block then free struct binderfs_info. + */ + kill_litter_super(sb); + + if (info && info->ipc_ns) + put_ipc_ns(info->ipc_ns); + + kfree(info); +} + +static struct file_system_type binder_fs_type =3D { + .name =3D "binder", + .init_fs_context =3D binderfs_init_fs_context, + .parameters =3D binderfs_fs_parameters, + .kill_sb =3D binderfs_kill_super, + .fs_flags =3D FS_USERNS_MOUNT, +}; + +int init_rust_binderfs(void) +{ + int ret; + const char *name; + size_t len; + + /* Verify that the default binderfs device names are valid. */ + name =3D rust_binder_devices_param; + for (len =3D strcspn(name, ","); len > 0; len =3D strcspn(name, ",")) { + if (len > BINDERFS_MAX_NAME) + return -E2BIG; + name +=3D len; + if (*name =3D=3D ',') + name++; + } + + /* Allocate new major number for binderfs. */ + ret =3D alloc_chrdev_region(&binderfs_dev, 0, BINDERFS_MAX_MINOR, + "rust_binder"); + if (ret) + return ret; + + ret =3D register_filesystem(&binder_fs_type); + if (ret) { + unregister_chrdev_region(binderfs_dev, BINDERFS_MAX_MINOR); + return ret; + } + + return ret; +} diff --git a/drivers/android/binder/stats.rs b/drivers/android/binder/stats= .rs new file mode 100644 index 0000000000000000000000000000000000000000..a83ec111d2cb50e8cf3282fd14e= 3ac004648658b --- /dev/null +++ b/drivers/android/binder/stats.rs @@ -0,0 +1,89 @@ +// SPDX-License-Identifier: GPL-2.0 + +// Copyright (C) 2025 Google LLC. + +//! Keep track of statistics for binder_logs. + +use crate::defs::*; +use core::sync::atomic::{AtomicU32, Ordering::Relaxed}; +use kernel::{ioctl::_IOC_NR, seq_file::SeqFile, seq_print}; + +const BC_COUNT: usize =3D _IOC_NR(BC_REPLY_SG) as usize + 1; +const BR_COUNT: usize =3D _IOC_NR(BR_TRANSACTION_PENDING_FROZEN) as usize = + 1; + +pub(crate) static GLOBAL_STATS: BinderStats =3D BinderStats::new(); + +pub(crate) struct BinderStats { + bc: [AtomicU32; BC_COUNT], + br: [AtomicU32; BR_COUNT], +} + +impl BinderStats { + pub(crate) const fn new() -> Self { + #[expect(clippy::declare_interior_mutable_const)] + const ZERO: AtomicU32 =3D AtomicU32::new(0); + + Self { + bc: [ZERO; BC_COUNT], + br: [ZERO; BR_COUNT], + } + } + + pub(crate) fn inc_bc(&self, bc: u32) { + let idx =3D _IOC_NR(bc) as usize; + if let Some(bc_ref) =3D self.bc.get(idx) { + bc_ref.fetch_add(1, Relaxed); + } + } + + pub(crate) fn inc_br(&self, br: u32) { + let idx =3D _IOC_NR(br) as usize; + if let Some(br_ref) =3D self.br.get(idx) { + br_ref.fetch_add(1, Relaxed); + } + } + + pub(crate) fn debug_print(&self, prefix: &str, m: &SeqFile) { + for (i, cnt) in self.bc.iter().enumerate() { + let cnt =3D cnt.load(Relaxed); + if cnt > 0 { + seq_print!(m, "{}{}: {}\n", prefix, command_string(i), cnt= ); + } + } + for (i, cnt) in self.br.iter().enumerate() { + let cnt =3D cnt.load(Relaxed); + if cnt > 0 { + seq_print!(m, "{}{}: {}\n", prefix, return_string(i), cnt); + } + } + } +} + +mod strings { + use core::str::from_utf8_unchecked; + use kernel::str::CStr; + + extern "C" { + static binder_command_strings: [*const u8; super::BC_COUNT]; + static binder_return_strings: [*const u8; super::BR_COUNT]; + } + + pub(super) fn command_string(i: usize) -> &'static str { + // SAFETY: Accessing `binder_command_strings` is always safe. + let c_str_ptr =3D unsafe { binder_command_strings[i] }; + // SAFETY: The `binder_command_strings` array only contains nul-te= rminated strings. + let bytes =3D unsafe { CStr::from_char_ptr(c_str_ptr) }.as_bytes(); + // SAFETY: The `binder_command_strings` array only contains string= s with ascii-chars. + unsafe { from_utf8_unchecked(bytes) } + } + + pub(super) fn return_string(i: usize) -> &'static str { + // SAFETY: Accessing `binder_return_strings` is always safe. + let c_str_ptr =3D unsafe { binder_return_strings[i] }; + // SAFETY: The `binder_command_strings` array only contains nul-te= rminated strings. + let bytes =3D unsafe { CStr::from_char_ptr(c_str_ptr) }.as_bytes(); + // SAFETY: The `binder_command_strings` array only contains string= s with ascii-chars. + unsafe { from_utf8_unchecked(bytes) } + } +} +use strings::{command_string, return_string}; diff --git a/drivers/android/binder/thread.rs b/drivers/android/binder/thre= ad.rs new file mode 100644 index 0000000000000000000000000000000000000000..7e34ccd394f8049bab88562ffb4= 601739aea670a --- /dev/null +++ b/drivers/android/binder/thread.rs @@ -0,0 +1,1596 @@ +// SPDX-License-Identifier: GPL-2.0 + +// Copyright (C) 2025 Google LLC. + +//! This module defines the `Thread` type, which represents a userspace th= read that is using +//! binder. +//! +//! The `Process` object stores all of the threads in an rb tree. + +use kernel::{ + bindings, + fs::{File, LocalFile}, + list::{AtomicTracker, List, ListArc, ListLinks, TryNewListArc}, + prelude::*, + security, + seq_file::SeqFile, + seq_print, + sync::poll::{PollCondVar, PollTable}, + sync::{Arc, SpinLock}, + task::Task, + types::ARef, + uaccess::UserSlice, + uapi, +}; + +use crate::{ + allocation::{Allocation, AllocationView, BinderObject, BinderObjectRef= , NewAllocation}, + defs::*, + error::BinderResult, + process::{GetWorkOrRegister, Process}, + ptr_align, + stats::GLOBAL_STATS, + transaction::Transaction, + BinderReturnWriter, DArc, DLArc, DTRWrap, DeliverCode, DeliverToRead, +}; + +use core::{ + mem::size_of, + sync::atomic::{AtomicU32, Ordering}, +}; + +/// Stores the layout of the scatter-gather entries. This is used during t= he `translate_objects` +/// call and is discarded when it returns. +struct ScatterGatherState { + /// A struct that tracks the amount of unused buffer space. + unused_buffer_space: UnusedBufferSpace, + /// Scatter-gather entries to copy. + sg_entries: KVec, + /// Indexes into `sg_entries` corresponding to the last binder_buffer_= object that + /// was processed and all of its ancestors. The array is in sorted ord= er. + ancestors: KVec, +} + +/// This entry specifies an additional buffer that should be copied using = the scatter-gather +/// mechanism. +struct ScatterGatherEntry { + /// The index in the offset array of the BINDER_TYPE_PTR that this ent= ry originates from. + obj_index: usize, + /// Offset in target buffer. + offset: usize, + /// User address in source buffer. + sender_uaddr: usize, + /// Number of bytes to copy. + length: usize, + /// The minimum offset of the next fixup in this buffer. + fixup_min_offset: usize, + /// The offsets within this buffer that contain pointers which should = be translated. + pointer_fixups: KVec, +} + +/// This entry specifies that a fixup should happen at `target_offset` of = the +/// buffer. If `skip` is nonzero, then the fixup is a `binder_fd_array_obj= ect` +/// and is applied later. Otherwise if `skip` is zero, then the size of the +/// fixup is `sizeof::()` and `pointer_value` is written to the buffe= r. +struct PointerFixupEntry { + /// The number of bytes to skip, or zero for a `binder_buffer_object` = fixup. + skip: usize, + /// The translated pointer to write when `skip` is zero. + pointer_value: u64, + /// The offset at which the value should be written. The offset is rel= ative + /// to the original buffer. + target_offset: usize, +} + +/// Return type of `apply_and_validate_fixup_in_parent`. +struct ParentFixupInfo { + /// The index of the parent buffer in `sg_entries`. + parent_sg_index: usize, + /// The number of ancestors of the buffer. + /// + /// The buffer is considered an ancestor of itself, so this is always = at + /// least one. + num_ancestors: usize, + /// New value of `fixup_min_offset` if this fixup is applied. + new_min_offset: usize, + /// The offset of the fixup in the target buffer. + target_offset: usize, +} + +impl ScatterGatherState { + /// Called when a `binder_buffer_object` or `binder_fd_array_object` t= ries + /// to access a region in its parent buffer. These accesses have vario= us + /// restrictions, which this method verifies. + /// + /// The `parent_offset` and `length` arguments describe the offset and + /// length of the access in the parent buffer. + /// + /// # Detailed restrictions + /// + /// Obviously the fixup must be in-bounds for the parent buffer. + /// + /// For safety reasons, we only allow fixups inside a buffer to happen + /// at increasing offsets; additionally, we only allow fixup on the la= st + /// buffer object that was verified, or one of its parents. + /// + /// Example of what is allowed: + /// + /// A + /// B (parent =3D A, offset =3D 0) + /// C (parent =3D A, offset =3D 16) + /// D (parent =3D C, offset =3D 0) + /// E (parent =3D A, offset =3D 32) // min_offset is 16 (C.parent_of= fset) + /// + /// Examples of what is not allowed: + /// + /// Decreasing offsets within the same parent: + /// A + /// C (parent =3D A, offset =3D 16) + /// B (parent =3D A, offset =3D 0) // decreasing offset within A + /// + /// Arcerring to a parent that wasn't the last object or any of its pa= rents: + /// A + /// B (parent =3D A, offset =3D 0) + /// C (parent =3D A, offset =3D 0) + /// C (parent =3D A, offset =3D 16) + /// D (parent =3D B, offset =3D 0) // B is not A or any of A's par= ents + fn validate_parent_fixup( + &self, + parent: usize, + parent_offset: usize, + length: usize, + ) -> Result { + // Using `position` would also be correct, but `rposition` avoids + // quadratic running times. + let ancestors_i =3D self + .ancestors + .iter() + .copied() + .rposition(|sg_idx| self.sg_entries[sg_idx].obj_index =3D=3D p= arent) + .ok_or(EINVAL)?; + let sg_idx =3D self.ancestors[ancestors_i]; + let sg_entry =3D match self.sg_entries.get(sg_idx) { + Some(sg_entry) =3D> sg_entry, + None =3D> { + pr_err!( + "self.ancestors[{}] is {}, but self.sg_entries.len() i= s {}", + ancestors_i, + sg_idx, + self.sg_entries.len() + ); + return Err(EINVAL); + } + }; + if sg_entry.fixup_min_offset > parent_offset { + pr_warn!( + "validate_parent_fixup: fixup_min_offset=3D{}, parent_offs= et=3D{}", + sg_entry.fixup_min_offset, + parent_offset + ); + return Err(EINVAL); + } + let new_min_offset =3D parent_offset.checked_add(length).ok_or(EIN= VAL)?; + if new_min_offset > sg_entry.length { + pr_warn!( + "validate_parent_fixup: new_min_offset=3D{}, sg_entry.leng= th=3D{}", + new_min_offset, + sg_entry.length + ); + return Err(EINVAL); + } + let target_offset =3D sg_entry.offset.checked_add(parent_offset).o= k_or(EINVAL)?; + // The `ancestors_i + 1` operation can't overflow since the output= of the addition is at + // most `self.ancestors.len()`, which also fits in a usize. + Ok(ParentFixupInfo { + parent_sg_index: sg_idx, + num_ancestors: ancestors_i + 1, + new_min_offset, + target_offset, + }) + } +} + +/// Keeps track of how much unused buffer space is left. The initial amoun= t is the number of bytes +/// requested by the user using the `buffers_size` field of `binder_transa= ction_data_sg`. Each time +/// we translate an object of type `BINDER_TYPE_PTR`, some of the unused b= uffer space is consumed. +struct UnusedBufferSpace { + /// The start of the remaining space. + offset: usize, + /// The end of the remaining space. + limit: usize, +} +impl UnusedBufferSpace { + /// Claim the next `size` bytes from the unused buffer space. The offs= et for the claimed chunk + /// into the buffer is returned. + fn claim_next(&mut self, size: usize) -> Result { + // We require every chunk to be aligned. + let size =3D ptr_align(size).ok_or(EINVAL)?; + let new_offset =3D self.offset.checked_add(size).ok_or(EINVAL)?; + + if new_offset <=3D self.limit { + let offset =3D self.offset; + self.offset =3D new_offset; + Ok(offset) + } else { + Err(EINVAL) + } + } +} + +pub(crate) enum PushWorkRes { + Ok, + FailedDead(DLArc), +} + +impl PushWorkRes { + fn is_ok(&self) -> bool { + match self { + PushWorkRes::Ok =3D> true, + PushWorkRes::FailedDead(_) =3D> false, + } + } +} + +/// The fields of `Thread` protected by the spinlock. +struct InnerThread { + /// Determines the looper state of the thread. It is a bit-wise combin= ation of the constants + /// prefixed with `LOOPER_`. + looper_flags: u32, + + /// Determines whether the looper should return. + looper_need_return: bool, + + /// Determines if thread is dead. + is_dead: bool, + + /// Work item used to deliver error codes to the thread that started a= transaction. Stored here + /// so that it can be reused. + reply_work: DArc, + + /// Work item used to deliver error codes to the current thread. Store= d here so that it can be + /// reused. + return_work: DArc, + + /// Determines whether the work list below should be processed. When s= et to false, `work_list` + /// is treated as if it were empty. + process_work_list: bool, + /// List of work items to deliver to userspace. + work_list: List>, + current_transaction: Option>, + + /// Extended error information for this thread. + extended_error: ExtendedError, +} + +const LOOPER_REGISTERED: u32 =3D 0x01; +const LOOPER_ENTERED: u32 =3D 0x02; +const LOOPER_EXITED: u32 =3D 0x04; +const LOOPER_INVALID: u32 =3D 0x08; +const LOOPER_WAITING: u32 =3D 0x10; +const LOOPER_WAITING_PROC: u32 =3D 0x20; +const LOOPER_POLL: u32 =3D 0x40; + +impl InnerThread { + fn new() -> Result { + fn next_err_id() -> u32 { + static EE_ID: AtomicU32 =3D AtomicU32::new(0); + EE_ID.fetch_add(1, Ordering::Relaxed) + } + + Ok(Self { + looper_flags: 0, + looper_need_return: false, + is_dead: false, + process_work_list: false, + reply_work: ThreadError::try_new()?, + return_work: ThreadError::try_new()?, + work_list: List::new(), + current_transaction: None, + extended_error: ExtendedError::new(next_err_id(), BR_OK, 0), + }) + } + + fn pop_work(&mut self) -> Option> { + if !self.process_work_list { + return None; + } + + let ret =3D self.work_list.pop_front(); + self.process_work_list =3D !self.work_list.is_empty(); + ret + } + + fn push_work(&mut self, work: DLArc) -> PushWorkRes= { + if self.is_dead { + PushWorkRes::FailedDead(work) + } else { + self.work_list.push_back(work); + self.process_work_list =3D true; + PushWorkRes::Ok + } + } + + fn push_reply_work(&mut self, code: u32) { + if let Ok(work) =3D ListArc::try_from_arc(self.reply_work.clone())= { + work.set_error_code(code); + self.push_work(work); + } else { + pr_warn!("Thread reply work is already in use."); + } + } + + fn push_return_work(&mut self, reply: u32) { + if let Ok(work) =3D ListArc::try_from_arc(self.return_work.clone()= ) { + work.set_error_code(reply); + self.push_work(work); + } else { + pr_warn!("Thread return work is already in use."); + } + } + + /// Used to push work items that do not need to be processed immediate= ly and can wait until the + /// thread gets another work item. + fn push_work_deferred(&mut self, work: DLArc) { + self.work_list.push_back(work); + } + + /// Fetches the transaction this thread can reply to. If the thread ha= s a pending transaction + /// (that it could respond to) but it has also issued a transaction, i= t must first wait for the + /// previously-issued transaction to complete. + /// + /// The `thread` parameter should be the thread containing this `Threa= dInner`. + fn pop_transaction_to_reply(&mut self, thread: &Thread) -> Result> { + let transaction =3D self.current_transaction.take().ok_or(EINVAL)?; + if core::ptr::eq(thread, transaction.from.as_ref()) { + self.current_transaction =3D Some(transaction); + return Err(EINVAL); + } + // Find a new current transaction for this thread. + self.current_transaction =3D transaction.find_from(thread).cloned(= ); + Ok(transaction) + } + + fn pop_transaction_replied(&mut self, transaction: &DArc)= -> bool { + match self.current_transaction.take() { + None =3D> false, + Some(old) =3D> { + if !Arc::ptr_eq(transaction, &old) { + self.current_transaction =3D Some(old); + return false; + } + self.current_transaction =3D old.clone_next(); + true + } + } + } + + fn looper_enter(&mut self) { + self.looper_flags |=3D LOOPER_ENTERED; + if self.looper_flags & LOOPER_REGISTERED !=3D 0 { + self.looper_flags |=3D LOOPER_INVALID; + } + } + + fn looper_register(&mut self, valid: bool) { + self.looper_flags |=3D LOOPER_REGISTERED; + if !valid || self.looper_flags & LOOPER_ENTERED !=3D 0 { + self.looper_flags |=3D LOOPER_INVALID; + } + } + + fn looper_exit(&mut self) { + self.looper_flags |=3D LOOPER_EXITED; + } + + /// Determines whether the thread is part of a pool, i.e., if it is a = looper. + fn is_looper(&self) -> bool { + self.looper_flags & (LOOPER_ENTERED | LOOPER_REGISTERED) !=3D 0 + } + + /// Determines whether the thread should attempt to fetch work items f= rom the process queue. + /// This is generally case when the thread is registered as a looper a= nd not part of a + /// transaction stack. But if there is local work, we want to return t= o userspace before we + /// deliver any remote work. + fn should_use_process_work_queue(&self) -> bool { + self.current_transaction.is_none() && !self.process_work_list && s= elf.is_looper() + } + + fn poll(&mut self) -> u32 { + self.looper_flags |=3D LOOPER_POLL; + if self.process_work_list || self.looper_need_return { + bindings::POLLIN + } else { + 0 + } + } +} + +/// This represents a thread that's used with binder. +#[pin_data] +pub(crate) struct Thread { + pub(crate) id: i32, + pub(crate) process: Arc, + pub(crate) task: ARef, + #[pin] + inner: SpinLock, + #[pin] + work_condvar: PollCondVar, + /// Used to insert this thread into the process' `ready_threads` list. + /// + /// INVARIANT: May never be used for any other list than the `self.pro= cess.ready_threads`. + #[pin] + links: ListLinks, + #[pin] + links_track: AtomicTracker, +} + +kernel::list::impl_list_arc_safe! { + impl ListArcSafe<0> for Thread { + tracked_by links_track: AtomicTracker; + } +} +kernel::list::impl_list_item! { + impl ListItem<0> for Thread { + using ListLinks { self.links }; + } +} + +impl Thread { + pub(crate) fn new(id: i32, process: Arc) -> Result>= { + let inner =3D InnerThread::new()?; + + Arc::pin_init( + try_pin_init!(Thread { + id, + process, + task: ARef::from(&**kernel::current!()), + inner <- kernel::new_spinlock!(inner, "Thread::inner"), + work_condvar <- kernel::new_poll_condvar!("Thread::work_co= ndvar"), + links <- ListLinks::new(), + links_track <- AtomicTracker::new(), + }), + GFP_KERNEL, + ) + } + + #[inline(never)] + pub(crate) fn debug_print(self: &Arc, m: &SeqFile, print_all: bo= ol) -> Result<()> { + let inner =3D self.inner.lock(); + + if print_all || inner.current_transaction.is_some() || !inner.work= _list.is_empty() { + seq_print!( + m, + " thread {}: l {:02x} need_return {}\n", + self.id, + inner.looper_flags, + inner.looper_need_return, + ); + } + + let mut t_opt =3D inner.current_transaction.as_ref(); + while let Some(t) =3D t_opt { + if Arc::ptr_eq(&t.from, self) { + t.debug_print_inner(m, " outgoing transaction "); + t_opt =3D t.from_parent.as_ref(); + } else if Arc::ptr_eq(&t.to, &self.process) { + t.debug_print_inner(m, " incoming transaction "); + t_opt =3D t.find_from(self); + } else { + t.debug_print_inner(m, " bad transaction "); + t_opt =3D None; + } + } + + for work in &inner.work_list { + work.debug_print(m, " ", " pending transaction ")?; + } + Ok(()) + } + + pub(crate) fn get_extended_error(&self, data: UserSlice) -> Result { + let mut writer =3D data.writer(); + let ee =3D self.inner.lock().extended_error; + writer.write(&ee)?; + Ok(()) + } + + pub(crate) fn set_current_transaction(&self, transaction: DArc) { + self.inner.lock().current_transaction =3D Some(transaction); + } + + pub(crate) fn has_current_transaction(&self) -> bool { + self.inner.lock().current_transaction.is_some() + } + + /// Attempts to fetch a work item from the thread-local queue. The beh= aviour if the queue is + /// empty depends on `wait`: if it is true, the function waits for som= e work to be queued (or a + /// signal); otherwise it returns indicating that none is available. + fn get_work_local(self: &Arc, wait: bool) -> Result>> { + { + let mut inner =3D self.inner.lock(); + if inner.looper_need_return { + return Ok(inner.pop_work()); + } + } + + // Try once if the caller does not want to wait. + if !wait { + return self.inner.lock().pop_work().ok_or(EAGAIN).map(Some); + } + + // Loop waiting only on the local queue (i.e., not registering wit= h the process queue). + let mut inner =3D self.inner.lock(); + loop { + if let Some(work) =3D inner.pop_work() { + return Ok(Some(work)); + } + + inner.looper_flags |=3D LOOPER_WAITING; + let signal_pending =3D self.work_condvar.wait_interruptible_fr= eezable(&mut inner); + inner.looper_flags &=3D !LOOPER_WAITING; + + if signal_pending { + return Err(EINTR); + } + if inner.looper_need_return { + return Ok(None); + } + } + } + + /// Attempts to fetch a work item from the thread-local queue, falling= back to the process-wide + /// queue if none is available locally. + /// + /// This must only be called when the thread is not participating in a= transaction chain. If it + /// is, the local version (`get_work_local`) should be used instead. + fn get_work(self: &Arc, wait: bool) -> Result>> { + // Try to get work from the thread's work queue, using only a loca= l lock. + { + let mut inner =3D self.inner.lock(); + if let Some(work) =3D inner.pop_work() { + return Ok(Some(work)); + } + if inner.looper_need_return { + drop(inner); + return Ok(self.process.get_work()); + } + } + + // If the caller doesn't want to wait, try to grab work from the p= rocess queue. + // + // We know nothing will have been queued directly to the thread qu= eue because it is not in + // a transaction and it is not in the process' ready list. + if !wait { + return self.process.get_work().ok_or(EAGAIN).map(Some); + } + + // Get work from the process queue. If none is available, atomical= ly register as ready. + let reg =3D match self.process.get_work_or_register(self) { + GetWorkOrRegister::Work(work) =3D> return Ok(Some(work)), + GetWorkOrRegister::Register(reg) =3D> reg, + }; + + let mut inner =3D self.inner.lock(); + loop { + if let Some(work) =3D inner.pop_work() { + return Ok(Some(work)); + } + + inner.looper_flags |=3D LOOPER_WAITING | LOOPER_WAITING_PROC; + let signal_pending =3D self.work_condvar.wait_interruptible_fr= eezable(&mut inner); + inner.looper_flags &=3D !(LOOPER_WAITING | LOOPER_WAITING_PROC= ); + + if signal_pending || inner.looper_need_return { + // We need to return now. We need to pull the thread off t= he list of ready threads + // (by dropping `reg`), then check the state again after i= t's off the list to + // ensure that something was not queued in the meantime. I= f something has been + // queued, we just return it (instead of the error). + drop(inner); + drop(reg); + + let res =3D match self.inner.lock().pop_work() { + Some(work) =3D> Ok(Some(work)), + None if signal_pending =3D> Err(EINTR), + None =3D> Ok(None), + }; + return res; + } + } + } + + /// Push the provided work item to be delivered to user space via this= thread. + /// + /// Returns whether the item was successfully pushed. This can only fa= il if the thread is dead. + pub(crate) fn push_work(&self, work: DLArc) -> Push= WorkRes { + let sync =3D work.should_sync_wakeup(); + + let res =3D self.inner.lock().push_work(work); + + if res.is_ok() { + if sync { + self.work_condvar.notify_sync(); + } else { + self.work_condvar.notify_one(); + } + } + + res + } + + /// Attempts to push to given work item to the thread if it's a looper= thread (i.e., if it's + /// part of a thread pool) and is alive. Otherwise, push the work item= to the process instead. + pub(crate) fn push_work_if_looper(&self, work: DLArc) -> BinderResult { + let mut inner =3D self.inner.lock(); + if inner.is_looper() && !inner.is_dead { + inner.push_work(work); + Ok(()) + } else { + drop(inner); + self.process.push_work(work) + } + } + + pub(crate) fn push_work_deferred(&self, work: DLArc= ) { + self.inner.lock().push_work_deferred(work); + } + + pub(crate) fn push_return_work(&self, reply: u32) { + self.inner.lock().push_return_work(reply); + } + + fn translate_object( + &self, + obj_index: usize, + offset: usize, + object: BinderObjectRef<'_>, + view: &mut AllocationView<'_>, + allow_fds: bool, + sg_state: &mut ScatterGatherState, + ) -> BinderResult { + match object { + BinderObjectRef::Binder(obj) =3D> { + let strong =3D obj.hdr.type_ =3D=3D BINDER_TYPE_BINDER; + // SAFETY: `binder` is a `binder_uintptr_t`; any bit patte= rn is a valid + // representation. + let ptr =3D unsafe { obj.__bindgen_anon_1.binder } as _; + let cookie =3D obj.cookie as _; + let flags =3D obj.flags as _; + let node =3D self + .process + .as_arc_borrow() + .get_node(ptr, cookie, flags, strong, self)?; + security::binder_transfer_binder(&self.process.cred, &view= .alloc.process.cred)?; + view.transfer_binder_object(offset, obj, strong, node)?; + } + BinderObjectRef::Handle(obj) =3D> { + let strong =3D obj.hdr.type_ =3D=3D BINDER_TYPE_HANDLE; + // SAFETY: `handle` is a `u32`; any bit pattern is a valid= representation. + let handle =3D unsafe { obj.__bindgen_anon_1.handle } as _; + let node =3D self.process.get_node_from_handle(handle, str= ong)?; + security::binder_transfer_binder(&self.process.cred, &view= .alloc.process.cred)?; + view.transfer_binder_object(offset, obj, strong, node)?; + } + BinderObjectRef::Fd(obj) =3D> { + if !allow_fds { + return Err(EPERM.into()); + } + + // SAFETY: `fd` is a `u32`; any bit pattern is a valid rep= resentation. + let fd =3D unsafe { obj.__bindgen_anon_1.fd }; + let file =3D LocalFile::fget(fd)?; + // SAFETY: The binder driver never calls `fdget_pos` and t= his code runs from an + // ioctl, so there are no active calls to `fdget_pos` on t= his thread. + let file =3D unsafe { LocalFile::assume_no_fdget_pos(file)= }; + security::binder_transfer_file( + &self.process.cred, + &view.alloc.process.cred, + &file, + )?; + + let mut obj_write =3D BinderFdObject::default(); + obj_write.hdr.type_ =3D BINDER_TYPE_FD; + // This will be overwritten with the actual fd when the tr= ansaction is received. + obj_write.__bindgen_anon_1.fd =3D u32::MAX; + obj_write.cookie =3D obj.cookie; + view.write::(offset, &obj_write)?; + + const FD_FIELD_OFFSET: usize =3D + core::mem::offset_of!(uapi::binder_fd_object, __bindge= n_anon_1.fd); + + let field_offset =3D offset + FD_FIELD_OFFSET; + + view.alloc.info_add_fd(file, field_offset, false)?; + } + BinderObjectRef::Ptr(obj) =3D> { + let obj_length =3D obj.length.try_into().map_err(|_| EINVA= L)?; + let alloc_offset =3D match sg_state.unused_buffer_space.cl= aim_next(obj_length) { + Ok(alloc_offset) =3D> alloc_offset, + Err(err) =3D> { + pr_warn!( + "Failed to claim space for a BINDER_TYPE_PTR. = (offset: {}, limit: {}, size: {})", + sg_state.unused_buffer_space.offset, + sg_state.unused_buffer_space.limit, + obj_length, + ); + return Err(err.into()); + } + }; + + let sg_state_idx =3D sg_state.sg_entries.len(); + sg_state.sg_entries.push( + ScatterGatherEntry { + obj_index, + offset: alloc_offset, + sender_uaddr: obj.buffer as _, + length: obj_length, + pointer_fixups: KVec::new(), + fixup_min_offset: 0, + }, + GFP_KERNEL, + )?; + + let buffer_ptr_in_user_space =3D (view.alloc.ptr + alloc_o= ffset) as u64; + + if obj.flags & uapi::BINDER_BUFFER_FLAG_HAS_PARENT =3D=3D = 0 { + sg_state.ancestors.clear(); + sg_state.ancestors.push(sg_state_idx, GFP_KERNEL)?; + } else { + // Another buffer also has a pointer to this buffer, a= nd we need to fixup that + // pointer too. + + let parent_index =3D usize::try_from(obj.parent).map_e= rr(|_| EINVAL)?; + let parent_offset =3D usize::try_from(obj.parent_offse= t).map_err(|_| EINVAL)?; + + let info =3D sg_state.validate_parent_fixup( + parent_index, + parent_offset, + size_of::(), + )?; + + sg_state.ancestors.truncate(info.num_ancestors); + sg_state.ancestors.push(sg_state_idx, GFP_KERNEL)?; + + let parent_entry =3D match sg_state.sg_entries.get_mut= (info.parent_sg_index) { + Some(parent_entry) =3D> parent_entry, + None =3D> { + pr_err!( + "validate_parent_fixup returned index out = of bounds for sg.entries" + ); + return Err(EINVAL.into()); + } + }; + + parent_entry.fixup_min_offset =3D info.new_min_offset; + parent_entry.pointer_fixups.push( + PointerFixupEntry { + skip: 0, + pointer_value: buffer_ptr_in_user_space, + target_offset: info.target_offset, + }, + GFP_KERNEL, + )?; + } + + let mut obj_write =3D BinderBufferObject::default(); + obj_write.hdr.type_ =3D BINDER_TYPE_PTR; + obj_write.flags =3D obj.flags; + obj_write.buffer =3D buffer_ptr_in_user_space; + obj_write.length =3D obj.length; + obj_write.parent =3D obj.parent; + obj_write.parent_offset =3D obj.parent_offset; + view.write::(offset, &obj_write)?; + } + BinderObjectRef::Fda(obj) =3D> { + if !allow_fds { + return Err(EPERM.into()); + } + let parent_index =3D usize::try_from(obj.parent).map_err(|= _| EINVAL)?; + let parent_offset =3D usize::try_from(obj.parent_offset).m= ap_err(|_| EINVAL)?; + let num_fds =3D usize::try_from(obj.num_fds).map_err(|_| E= INVAL)?; + let fds_len =3D num_fds.checked_mul(size_of::()).ok_o= r(EINVAL)?; + + let info =3D sg_state.validate_parent_fixup(parent_index, = parent_offset, fds_len)?; + view.alloc.info_add_fd_reserve(num_fds)?; + + sg_state.ancestors.truncate(info.num_ancestors); + let parent_entry =3D match sg_state.sg_entries.get_mut(inf= o.parent_sg_index) { + Some(parent_entry) =3D> parent_entry, + None =3D> { + pr_err!( + "validate_parent_fixup returned index out of b= ounds for sg.entries" + ); + return Err(EINVAL.into()); + } + }; + + parent_entry.fixup_min_offset =3D info.new_min_offset; + parent_entry + .pointer_fixups + .push( + PointerFixupEntry { + skip: fds_len, + pointer_value: 0, + target_offset: info.target_offset, + }, + GFP_KERNEL, + ) + .map_err(|_| ENOMEM)?; + + let fda_uaddr =3D parent_entry + .sender_uaddr + .checked_add(parent_offset) + .ok_or(EINVAL)?; + let mut fda_bytes =3D KVec::new(); + UserSlice::new(UserPtr::from_addr(fda_uaddr as _), fds_len) + .read_all(&mut fda_bytes, GFP_KERNEL)?; + + if fds_len !=3D fda_bytes.len() { + pr_err!("UserSlice::read_all returned wrong length in = BINDER_TYPE_FDA"); + return Err(EINVAL.into()); + } + + for i in (0..fds_len).step_by(size_of::()) { + let fd =3D { + let mut fd_bytes =3D [0u8; size_of::()]; + fd_bytes.copy_from_slice(&fda_bytes[i..i + size_of= ::()]); + u32::from_ne_bytes(fd_bytes) + }; + + let file =3D LocalFile::fget(fd)?; + // SAFETY: The binder driver never calls `fdget_pos` a= nd this code runs from an + // ioctl, so there are no active calls to `fdget_pos` = on this thread. + let file =3D unsafe { LocalFile::assume_no_fdget_pos(f= ile) }; + security::binder_transfer_file( + &self.process.cred, + &view.alloc.process.cred, + &file, + )?; + + // The `validate_parent_fixup` call ensuers that this = addition will not + // overflow. + view.alloc.info_add_fd(file, info.target_offset + i, t= rue)?; + } + drop(fda_bytes); + + let mut obj_write =3D BinderFdArrayObject::default(); + obj_write.hdr.type_ =3D BINDER_TYPE_FDA; + obj_write.num_fds =3D obj.num_fds; + obj_write.parent =3D obj.parent; + obj_write.parent_offset =3D obj.parent_offset; + view.write::(offset, &obj_write)?; + } + } + Ok(()) + } + + fn apply_sg(&self, alloc: &mut Allocation, sg_state: &mut ScatterGathe= rState) -> BinderResult { + for sg_entry in &mut sg_state.sg_entries { + let mut end_of_previous_fixup =3D sg_entry.offset; + let offset_end =3D sg_entry.offset.checked_add(sg_entry.length= ).ok_or(EINVAL)?; + + let mut reader =3D + UserSlice::new(UserPtr::from_addr(sg_entry.sender_uaddr), = sg_entry.length).reader(); + for fixup in &mut sg_entry.pointer_fixups { + let fixup_len =3D if fixup.skip =3D=3D 0 { + size_of::() + } else { + fixup.skip + }; + + let target_offset_end =3D fixup.target_offset.checked_add(= fixup_len).ok_or(EINVAL)?; + if fixup.target_offset < end_of_previous_fixup || offset_e= nd < target_offset_end { + pr_warn!( + "Fixups oob {} {} {} {}", + fixup.target_offset, + end_of_previous_fixup, + offset_end, + target_offset_end + ); + return Err(EINVAL.into()); + } + + let copy_off =3D end_of_previous_fixup; + let copy_len =3D fixup.target_offset - end_of_previous_fix= up; + if let Err(err) =3D alloc.copy_into(&mut reader, copy_off,= copy_len) { + pr_warn!("Failed copying into alloc: {:?}", err); + return Err(err.into()); + } + if fixup.skip =3D=3D 0 { + let res =3D alloc.write::(fixup.target_offset, &f= ixup.pointer_value); + if let Err(err) =3D res { + pr_warn!("Failed copying ptr into alloc: {:?}", er= r); + return Err(err.into()); + } + } + if let Err(err) =3D reader.skip(fixup_len) { + pr_warn!("Failed skipping {} from reader: {:?}", fixup= _len, err); + return Err(err.into()); + } + end_of_previous_fixup =3D target_offset_end; + } + let copy_off =3D end_of_previous_fixup; + let copy_len =3D offset_end - end_of_previous_fixup; + if let Err(err) =3D alloc.copy_into(&mut reader, copy_off, cop= y_len) { + pr_warn!("Failed copying remainder into alloc: {:?}", err); + return Err(err.into()); + } + } + Ok(()) + } + + /// This method copies the payload of a transaction into the target pr= ocess. + /// + /// The resulting payload will have several different components, whic= h will be stored next to + /// each other in the allocation. Furthermore, various objects can be = embedded in the payload, + /// and those objects have to be translated so that they make sense to= the target transaction. + pub(crate) fn copy_transaction_data( + &self, + to_process: Arc, + tr: &BinderTransactionDataSg, + debug_id: usize, + allow_fds: bool, + txn_security_ctx_offset: Option<&mut usize>, + ) -> BinderResult { + let trd =3D &tr.transaction_data; + let is_oneway =3D trd.flags & TF_ONE_WAY !=3D 0; + let mut secctx =3D if let Some(offset) =3D txn_security_ctx_offset= { + let secid =3D self.process.cred.get_secid(); + let ctx =3D match security::SecurityCtx::from_secid(secid) { + Ok(ctx) =3D> ctx, + Err(err) =3D> { + pr_warn!("Failed to get security ctx for id {}: {:?}",= secid, err); + return Err(err.into()); + } + }; + Some((offset, ctx)) + } else { + None + }; + + let data_size =3D trd.data_size.try_into().map_err(|_| EINVAL)?; + let aligned_data_size =3D ptr_align(data_size).ok_or(EINVAL)?; + let offsets_size =3D trd.offsets_size.try_into().map_err(|_| EINVA= L)?; + let aligned_offsets_size =3D ptr_align(offsets_size).ok_or(EINVAL)= ?; + let buffers_size =3D tr.buffers_size.try_into().map_err(|_| EINVAL= )?; + let aligned_buffers_size =3D ptr_align(buffers_size).ok_or(EINVAL)= ?; + let aligned_secctx_size =3D match secctx.as_ref() { + Some((_offset, ctx)) =3D> ptr_align(ctx.len()).ok_or(EINVAL)?, + None =3D> 0, + }; + + // This guarantees that at least `sizeof(usize)` bytes will be all= ocated. + let len =3D usize::max( + aligned_data_size + .checked_add(aligned_offsets_size) + .and_then(|sum| sum.checked_add(aligned_buffers_size)) + .and_then(|sum| sum.checked_add(aligned_secctx_size)) + .ok_or(ENOMEM)?, + size_of::(), + ); + let secctx_off =3D aligned_data_size + aligned_offsets_size + alig= ned_buffers_size; + let mut alloc =3D + match to_process.buffer_alloc(debug_id, len, is_oneway, self.p= rocess.task.pid()) { + Ok(alloc) =3D> alloc, + Err(err) =3D> { + pr_warn!( + "Failed to allocate buffer. len:{}, is_oneway:{}", + len, + is_oneway + ); + return Err(err); + } + }; + + // SAFETY: This accesses a union field, but it's okay because the = field's type is valid for + // all bit-patterns. + let trd_data_ptr =3D unsafe { &trd.data.ptr }; + let mut buffer_reader =3D + UserSlice::new(UserPtr::from_addr(trd_data_ptr.buffer as _), d= ata_size).reader(); + let mut end_of_previous_object =3D 0; + let mut sg_state =3D None; + + // Copy offsets if there are any. + if offsets_size > 0 { + { + let mut reader =3D + UserSlice::new(UserPtr::from_addr(trd_data_ptr.offsets= as _), offsets_size) + .reader(); + alloc.copy_into(&mut reader, aligned_data_size, offsets_si= ze)?; + } + + let offsets_start =3D aligned_data_size; + let offsets_end =3D aligned_data_size + aligned_offsets_size; + + // This state is used for BINDER_TYPE_PTR objects. + let sg_state =3D sg_state.insert(ScatterGatherState { + unused_buffer_space: UnusedBufferSpace { + offset: offsets_end, + limit: len, + }, + sg_entries: KVec::new(), + ancestors: KVec::new(), + }); + + // Traverse the objects specified. + let mut view =3D AllocationView::new(&mut alloc, data_size); + for (index, index_offset) in (offsets_start..offsets_end) + .step_by(size_of::()) + .enumerate() + { + let offset =3D view.alloc.read(index_offset)?; + + if offset < end_of_previous_object { + pr_warn!("Got transaction with invalid offset."); + return Err(EINVAL.into()); + } + + // Copy data between two objects. + if end_of_previous_object < offset { + view.copy_into( + &mut buffer_reader, + end_of_previous_object, + offset - end_of_previous_object, + )?; + } + + let mut object =3D BinderObject::read_from(&mut buffer_rea= der)?; + + match self.translate_object( + index, + offset, + object.as_ref(), + &mut view, + allow_fds, + sg_state, + ) { + Ok(()) =3D> end_of_previous_object =3D offset + object= .size(), + Err(err) =3D> { + pr_warn!("Error while translating object."); + return Err(err); + } + } + + // Update the indexes containing objects to clean up. + let offset_after_object =3D index_offset + size_of::(); + view.alloc + .set_info_offsets(offsets_start..offset_after_object); + } + } + + // Copy remaining raw data. + alloc.copy_into( + &mut buffer_reader, + end_of_previous_object, + data_size - end_of_previous_object, + )?; + + if let Some(sg_state) =3D sg_state.as_mut() { + if let Err(err) =3D self.apply_sg(&mut alloc, sg_state) { + pr_warn!("Failure in apply_sg: {:?}", err); + return Err(err); + } + } + + if let Some((off_out, secctx)) =3D secctx.as_mut() { + if let Err(err) =3D alloc.write(secctx_off, secctx.as_bytes())= { + pr_warn!("Failed to write security context: {:?}", err); + return Err(err.into()); + } + **off_out =3D secctx_off; + } + Ok(alloc) + } + + fn unwind_transaction_stack(self: &Arc) { + let mut thread =3D self.clone(); + while let Ok(transaction) =3D { + let mut inner =3D thread.inner.lock(); + inner.pop_transaction_to_reply(thread.as_ref()) + } { + let reply =3D Err(BR_DEAD_REPLY); + if !transaction.from.deliver_single_reply(reply, &transaction)= { + break; + } + + thread =3D transaction.from.clone(); + } + } + + pub(crate) fn deliver_reply( + &self, + reply: Result, u32>, + transaction: &DArc, + ) { + if self.deliver_single_reply(reply, transaction) { + transaction.from.unwind_transaction_stack(); + } + } + + /// Delivers a reply to the thread that started a transaction. The rep= ly can either be a + /// reply-transaction or an error code to be delivered instead. + /// + /// Returns whether the thread is dead. If it is, the caller is expect= ed to unwind the + /// transaction stack by completing transactions for threads that are = dead. + fn deliver_single_reply( + &self, + reply: Result, u32>, + transaction: &DArc, + ) -> bool { + if let Ok(transaction) =3D &reply { + transaction.set_outstanding(&mut self.process.inner.lock()); + } + + { + let mut inner =3D self.inner.lock(); + if !inner.pop_transaction_replied(transaction) { + return false; + } + + if inner.is_dead { + return true; + } + + match reply { + Ok(work) =3D> { + inner.push_work(work); + } + Err(code) =3D> inner.push_reply_work(code), + } + } + + // Notify the thread now that we've released the inner lock. + self.work_condvar.notify_sync(); + false + } + + /// Determines if the given transaction is the current transaction for= this thread. + fn is_current_transaction(&self, transaction: &DArc) -> b= ool { + let inner =3D self.inner.lock(); + match &inner.current_transaction { + None =3D> false, + Some(current) =3D> Arc::ptr_eq(current, transaction), + } + } + + /// Determines the current top of the transaction stack. It fails if t= he top is in another + /// thread (i.e., this thread belongs to a stack but it has called ano= ther thread). The top is + /// [`None`] if the thread is not currently participating in a transac= tion stack. + fn top_of_transaction_stack(&self) -> Result>= > { + let inner =3D self.inner.lock(); + if let Some(cur) =3D &inner.current_transaction { + if core::ptr::eq(self, cur.from.as_ref()) { + pr_warn!("got new transaction with bad transaction stack"); + return Err(EINVAL); + } + Ok(Some(cur.clone())) + } else { + Ok(None) + } + } + + fn transaction(self: &Arc, tr: &BinderTransactionDataSg, inne= r: T) + where + T: FnOnce(&Arc, &BinderTransactionDataSg) -> BinderResult, + { + if let Err(err) =3D inner(self, tr) { + if err.should_pr_warn() { + let mut ee =3D self.inner.lock().extended_error; + ee.command =3D err.reply; + ee.param =3D err.as_errno(); + pr_warn!( + "Transaction failed: {:?} my_pid:{}", + err, + self.process.pid_in_current_ns() + ); + } + + self.push_return_work(err.reply); + } + } + + fn transaction_inner(self: &Arc, tr: &BinderTransactionDataSg) -= > BinderResult { + // SAFETY: Handle's type has no invalid bit patterns. + let handle =3D unsafe { tr.transaction_data.target.handle }; + let node_ref =3D self.process.get_transaction_node(handle)?; + security::binder_transaction(&self.process.cred, &node_ref.node.ow= ner.cred)?; + // TODO: We need to ensure that there isn't a pending transaction = in the work queue. How + // could this happen? + let top =3D self.top_of_transaction_stack()?; + let list_completion =3D DTRWrap::arc_try_new(DeliverCode::new(BR_T= RANSACTION_COMPLETE))?; + let completion =3D list_completion.clone_arc(); + let transaction =3D Transaction::new(node_ref, top, self, tr)?; + + // Check that the transaction stack hasn't changed while the lock = was released, then update + // it with the new transaction. + { + let mut inner =3D self.inner.lock(); + if !transaction.is_stacked_on(&inner.current_transaction) { + pr_warn!("Transaction stack changed during transaction!"); + return Err(EINVAL.into()); + } + inner.current_transaction =3D Some(transaction.clone_arc()); + // We push the completion as a deferred work so that we wait f= or the reply before + // returning to userland. + inner.push_work_deferred(list_completion); + } + + if let Err(e) =3D transaction.submit() { + completion.skip(); + // Define `transaction` first to drop it after `inner`. + let transaction; + let mut inner =3D self.inner.lock(); + transaction =3D inner.current_transaction.take().unwrap(); + inner.current_transaction =3D transaction.clone_next(); + Err(e) + } else { + Ok(()) + } + } + + fn reply_inner(self: &Arc, tr: &BinderTransactionDataSg) -> Bind= erResult { + let orig =3D self.inner.lock().pop_transaction_to_reply(self)?; + if !orig.from.is_current_transaction(&orig) { + return Err(EINVAL.into()); + } + + // We need to complete the transaction even if we cannot complete = building the reply. + let out =3D (|| -> BinderResult<_> { + let completion =3D DTRWrap::arc_try_new(DeliverCode::new(BR_TR= ANSACTION_COMPLETE))?; + let process =3D orig.from.process.clone(); + let allow_fds =3D orig.flags & TF_ACCEPT_FDS !=3D 0; + let reply =3D Transaction::new_reply(self, process, tr, allow_= fds)?; + self.inner.lock().push_work(completion); + orig.from.deliver_reply(Ok(reply), &orig); + Ok(()) + })() + .map_err(|mut err| { + // At this point we only return `BR_TRANSACTION_COMPLETE` to t= he caller, and we must let + // the sender know that the transaction has completed (with an= error in this case). + pr_warn!( + "Failure {:?} during reply - delivering BR_FAILED_REPLY to= sender.", + err + ); + let reply =3D Err(BR_FAILED_REPLY); + orig.from.deliver_reply(reply, &orig); + err.reply =3D BR_TRANSACTION_COMPLETE; + err + }); + + out + } + + fn oneway_transaction_inner(self: &Arc, tr: &BinderTransactionDa= taSg) -> BinderResult { + // SAFETY: The `handle` field is valid for all possible byte value= s, so reading from the + // union is okay. + let handle =3D unsafe { tr.transaction_data.target.handle }; + let node_ref =3D self.process.get_transaction_node(handle)?; + security::binder_transaction(&self.process.cred, &node_ref.node.ow= ner.cred)?; + let transaction =3D Transaction::new(node_ref, None, self, tr)?; + let code =3D if self.process.is_oneway_spam_detection_enabled() + && transaction.oneway_spam_detected + { + BR_ONEWAY_SPAM_SUSPECT + } else { + BR_TRANSACTION_COMPLETE + }; + let list_completion =3D DTRWrap::arc_try_new(DeliverCode::new(code= ))?; + let completion =3D list_completion.clone_arc(); + self.inner.lock().push_work(list_completion); + match transaction.submit() { + Ok(()) =3D> Ok(()), + Err(err) =3D> { + completion.skip(); + Err(err) + } + } + } + + fn write(self: &Arc, req: &mut BinderWriteRead) -> Result { + let write_start =3D req.write_buffer.wrapping_add(req.write_consum= ed); + let write_len =3D req.write_size.saturating_sub(req.write_consumed= ); + let mut reader =3D + UserSlice::new(UserPtr::from_addr(write_start as _), write_len= as _).reader(); + + while reader.len() >=3D size_of::() && self.inner.lock().retu= rn_work.is_unused() { + let before =3D reader.len(); + let cmd =3D reader.read::()?; + GLOBAL_STATS.inc_bc(cmd); + self.process.stats.inc_bc(cmd); + match cmd { + BC_TRANSACTION =3D> { + let tr =3D reader.read::()?.wit= h_buffers_size(0); + if tr.transaction_data.flags & TF_ONE_WAY !=3D 0 { + self.transaction(&tr, Self::oneway_transaction_inn= er); + } else { + self.transaction(&tr, Self::transaction_inner); + } + } + BC_TRANSACTION_SG =3D> { + let tr =3D reader.read::()?; + if tr.transaction_data.flags & TF_ONE_WAY !=3D 0 { + self.transaction(&tr, Self::oneway_transaction_inn= er); + } else { + self.transaction(&tr, Self::transaction_inner); + } + } + BC_REPLY =3D> { + let tr =3D reader.read::()?.wit= h_buffers_size(0); + self.transaction(&tr, Self::reply_inner) + } + BC_REPLY_SG =3D> { + let tr =3D reader.read::()?; + self.transaction(&tr, Self::reply_inner) + } + BC_FREE_BUFFER =3D> { + let buffer =3D self.process.buffer_get(reader.read()?); + if let Some(buffer) =3D &buffer { + if buffer.looper_need_return_on_free() { + self.inner.lock().looper_need_return =3D true; + } + } + drop(buffer); + } + BC_INCREFS =3D> { + self.process + .as_arc_borrow() + .update_ref(reader.read()?, true, false)? + } + BC_ACQUIRE =3D> { + self.process + .as_arc_borrow() + .update_ref(reader.read()?, true, true)? + } + BC_RELEASE =3D> { + self.process + .as_arc_borrow() + .update_ref(reader.read()?, false, true)? + } + BC_DECREFS =3D> { + self.process + .as_arc_borrow() + .update_ref(reader.read()?, false, false)? + } + BC_INCREFS_DONE =3D> self.process.inc_ref_done(&mut reader= , false)?, + BC_ACQUIRE_DONE =3D> self.process.inc_ref_done(&mut reader= , true)?, + BC_REQUEST_DEATH_NOTIFICATION =3D> self.process.request_de= ath(&mut reader, self)?, + BC_CLEAR_DEATH_NOTIFICATION =3D> self.process.clear_death(= &mut reader, self)?, + BC_DEAD_BINDER_DONE =3D> self.process.dead_binder_done(rea= der.read()?, self), + BC_REGISTER_LOOPER =3D> { + let valid =3D self.process.register_thread(); + self.inner.lock().looper_register(valid); + } + BC_ENTER_LOOPER =3D> self.inner.lock().looper_enter(), + BC_EXIT_LOOPER =3D> self.inner.lock().looper_exit(), + BC_REQUEST_FREEZE_NOTIFICATION =3D> self.process.request_f= reeze_notif(&mut reader)?, + BC_CLEAR_FREEZE_NOTIFICATION =3D> self.process.clear_freez= e_notif(&mut reader)?, + BC_FREEZE_NOTIFICATION_DONE =3D> self.process.freeze_notif= _done(&mut reader)?, + + // Fail if given an unknown error code. + // BC_ATTEMPT_ACQUIRE and BC_ACQUIRE_RESULT are no longer = supported. + _ =3D> return Err(EINVAL), + } + // Update the number of write bytes consumed. + req.write_consumed +=3D (before - reader.len()) as u64; + } + + Ok(()) + } + + fn read(self: &Arc, req: &mut BinderWriteRead, wait: bool) -> Re= sult { + let read_start =3D req.read_buffer.wrapping_add(req.read_consumed); + let read_len =3D req.read_size.saturating_sub(req.read_consumed); + let mut writer =3D BinderReturnWriter::new( + UserSlice::new(UserPtr::from_addr(read_start as _), read_len a= s _).writer(), + self, + ); + let (in_pool, use_proc_queue) =3D { + let inner =3D self.inner.lock(); + (inner.is_looper(), inner.should_use_process_work_queue()) + }; + + let getter =3D if use_proc_queue { + Self::get_work + } else { + Self::get_work_local + }; + + // Reserve some room at the beginning of the read buffer so that w= e can send a + // BR_SPAWN_LOOPER if we need to. + let mut has_noop_placeholder =3D false; + if req.read_consumed =3D=3D 0 { + if let Err(err) =3D writer.write_code(BR_NOOP) { + pr_warn!("Failure when writing BR_NOOP at beginning of buf= fer."); + return Err(err); + } + has_noop_placeholder =3D true; + } + + // Loop doing work while there is room in the buffer. + let initial_len =3D writer.len(); + while writer.len() >=3D size_of::() + 4 { + match getter(self, wait && initial_len =3D=3D writer.len()) { + Ok(Some(work)) =3D> match work.into_arc().do_work(self, &m= ut writer) { + Ok(true) =3D> {} + Ok(false) =3D> break, + Err(err) =3D> { + return Err(err); + } + }, + Ok(None) =3D> { + break; + } + Err(err) =3D> { + // Propagate the error if we haven't written anything = else. + if err !=3D EINTR && err !=3D EAGAIN { + pr_warn!("Failure in work getter: {:?}", err); + } + if initial_len =3D=3D writer.len() { + return Err(err); + } else { + break; + } + } + } + } + + req.read_consumed +=3D read_len - writer.len() as u64; + + // Write BR_SPAWN_LOOPER if the process needs more threads for its= pool. + if has_noop_placeholder && in_pool && self.process.needs_thread() { + let mut writer =3D + UserSlice::new(UserPtr::from_addr(req.read_buffer as _), r= eq.read_size as _) + .writer(); + writer.write(&BR_SPAWN_LOOPER)?; + } + Ok(()) + } + + pub(crate) fn write_read(self: &Arc, data: UserSlice, wait: bool= ) -> Result { + let (mut reader, mut writer) =3D data.reader_writer(); + let mut req =3D reader.read::()?; + + // Go through the write buffer. + let mut ret =3D Ok(()); + if req.write_size > 0 { + ret =3D self.write(&mut req); + if let Err(err) =3D ret { + pr_warn!( + "Write failure {:?} in pid:{}", + err, + self.process.pid_in_current_ns() + ); + req.read_consumed =3D 0; + writer.write(&req)?; + self.inner.lock().looper_need_return =3D false; + return ret; + } + } + + // Go through the work queue. + if req.read_size > 0 { + ret =3D self.read(&mut req, wait); + if ret.is_err() && ret !=3D Err(EINTR) { + pr_warn!( + "Read failure {:?} in pid:{}", + ret, + self.process.pid_in_current_ns() + ); + } + } + + // Write the request back so that the consumed fields are visible = to the caller. + writer.write(&req)?; + + self.inner.lock().looper_need_return =3D false; + + ret + } + + pub(crate) fn poll(&self, file: &File, table: PollTable<'_>) -> (bool,= u32) { + table.register_wait(file, &self.work_condvar); + let mut inner =3D self.inner.lock(); + (inner.should_use_process_work_queue(), inner.poll()) + } + + /// Make the call to `get_work` or `get_work_local` return immediately= , if any. + pub(crate) fn exit_looper(&self) { + let mut inner =3D self.inner.lock(); + let should_notify =3D inner.looper_flags & LOOPER_WAITING !=3D 0; + if should_notify { + inner.looper_need_return =3D true; + } + drop(inner); + + if should_notify { + self.work_condvar.notify_one(); + } + } + + pub(crate) fn notify_if_poll_ready(&self, sync: bool) { + // Determine if we need to notify. This requires the lock. + let inner =3D self.inner.lock(); + let notify =3D inner.looper_flags & LOOPER_POLL !=3D 0 && inner.sh= ould_use_process_work_queue(); + drop(inner); + + // Now that the lock is no longer held, notify the waiters if we h= ave to. + if notify { + if sync { + self.work_condvar.notify_sync(); + } else { + self.work_condvar.notify_one(); + } + } + } + + pub(crate) fn release(self: &Arc) { + self.inner.lock().is_dead =3D true; + + //self.work_condvar.clear(); + self.unwind_transaction_stack(); + + // Cancel all pending work items. + while let Ok(Some(work)) =3D self.get_work_local(false) { + work.into_arc().cancel(); + } + } +} + +#[pin_data] +struct ThreadError { + error_code: AtomicU32, + #[pin] + links_track: AtomicTracker, +} + +impl ThreadError { + fn try_new() -> Result> { + DTRWrap::arc_pin_init(pin_init!(Self { + error_code: AtomicU32::new(BR_OK), + links_track <- AtomicTracker::new(), + })) + .map(ListArc::into_arc) + } + + fn set_error_code(&self, code: u32) { + self.error_code.store(code, Ordering::Relaxed); + } + + fn is_unused(&self) -> bool { + self.error_code.load(Ordering::Relaxed) =3D=3D BR_OK + } +} + +impl DeliverToRead for ThreadError { + fn do_work( + self: DArc, + _thread: &Thread, + writer: &mut BinderReturnWriter<'_>, + ) -> Result { + let code =3D self.error_code.load(Ordering::Relaxed); + self.error_code.store(BR_OK, Ordering::Relaxed); + writer.write_code(code)?; + Ok(true) + } + + fn cancel(self: DArc) {} + + fn should_sync_wakeup(&self) -> bool { + false + } + + fn debug_print(&self, m: &SeqFile, prefix: &str, _tprefix: &str) -> Re= sult<()> { + seq_print!( + m, + "{}transaction error: {}\n", + prefix, + self.error_code.load(Ordering::Relaxed) + ); + Ok(()) + } +} + +kernel::list::impl_list_arc_safe! { + impl ListArcSafe<0> for ThreadError { + tracked_by links_track: AtomicTracker; + } +} diff --git a/drivers/android/binder/trace.rs b/drivers/android/binder/trace= .rs new file mode 100644 index 0000000000000000000000000000000000000000..af0e4392805e7ef2a39b7c1e557= ea233dcd810ab --- /dev/null +++ b/drivers/android/binder/trace.rs @@ -0,0 +1,16 @@ +// SPDX-License-Identifier: GPL-2.0 + +// Copyright (C) 2025 Google LLC. + +use kernel::ffi::{c_uint, c_ulong}; +use kernel::tracepoint::declare_trace; + +declare_trace! { + unsafe fn rust_binder_ioctl(cmd: c_uint, arg: c_ulong); +} + +#[inline] +pub(crate) fn trace_ioctl(cmd: u32, arg: usize) { + // SAFETY: Always safe to call. + unsafe { rust_binder_ioctl(cmd, arg as c_ulong) } +} diff --git a/drivers/android/binder/transaction.rs b/drivers/android/binder= /transaction.rs new file mode 100644 index 0000000000000000000000000000000000000000..02512175d6229535373f2d3e543= ba8c91ecd72f0 --- /dev/null +++ b/drivers/android/binder/transaction.rs @@ -0,0 +1,456 @@ +// SPDX-License-Identifier: GPL-2.0 + +// Copyright (C) 2025 Google LLC. + +use core::sync::atomic::{AtomicBool, Ordering}; +use kernel::{ + prelude::*, + seq_file::SeqFile, + seq_print, + sync::{Arc, SpinLock}, + task::Kuid, + time::{Instant, Monotonic}, + types::ScopeGuard, +}; + +use crate::{ + allocation::{Allocation, TranslatedFds}, + defs::*, + error::{BinderError, BinderResult}, + node::{Node, NodeRef}, + process::{Process, ProcessInner}, + ptr_align, + thread::{PushWorkRes, Thread}, + BinderReturnWriter, DArc, DLArc, DTRWrap, DeliverToRead, +}; + +#[pin_data(PinnedDrop)] +pub(crate) struct Transaction { + pub(crate) debug_id: usize, + target_node: Option>, + pub(crate) from_parent: Option>, + pub(crate) from: Arc, + pub(crate) to: Arc, + #[pin] + allocation: SpinLock>, + is_outstanding: AtomicBool, + code: u32, + pub(crate) flags: u32, + data_size: usize, + offsets_size: usize, + data_address: usize, + sender_euid: Kuid, + txn_security_ctx_off: Option, + pub(crate) oneway_spam_detected: bool, + start_time: Instant, +} + +kernel::list::impl_list_arc_safe! { + impl ListArcSafe<0> for Transaction { untracked; } +} + +impl Transaction { + pub(crate) fn new( + node_ref: NodeRef, + from_parent: Option>, + from: &Arc, + tr: &BinderTransactionDataSg, + ) -> BinderResult> { + let debug_id =3D super::next_debug_id(); + let trd =3D &tr.transaction_data; + let allow_fds =3D node_ref.node.flags & FLAT_BINDER_FLAG_ACCEPTS_F= DS !=3D 0; + let txn_security_ctx =3D node_ref.node.flags & FLAT_BINDER_FLAG_TX= N_SECURITY_CTX !=3D 0; + let mut txn_security_ctx_off =3D if txn_security_ctx { Some(0) } e= lse { None }; + let to =3D node_ref.node.owner.clone(); + let mut alloc =3D match from.copy_transaction_data( + to.clone(), + tr, + debug_id, + allow_fds, + txn_security_ctx_off.as_mut(), + ) { + Ok(alloc) =3D> alloc, + Err(err) =3D> { + if !err.is_dead() { + pr_warn!("Failure in copy_transaction_data: {:?}", err= ); + } + return Err(err); + } + }; + let oneway_spam_detected =3D alloc.oneway_spam_detected; + if trd.flags & TF_ONE_WAY !=3D 0 { + if from_parent.is_some() { + pr_warn!("Oneway transaction should not be in a transactio= n stack."); + return Err(EINVAL.into()); + } + alloc.set_info_oneway_node(node_ref.node.clone()); + } + if trd.flags & TF_CLEAR_BUF !=3D 0 { + alloc.set_info_clear_on_drop(); + } + let target_node =3D node_ref.node.clone(); + alloc.set_info_target_node(node_ref); + let data_address =3D alloc.ptr; + + Ok(DTRWrap::arc_pin_init(pin_init!(Transaction { + debug_id, + target_node: Some(target_node), + from_parent, + sender_euid: from.process.task.euid(), + from: from.clone(), + to, + code: trd.code, + flags: trd.flags, + data_size: trd.data_size as _, + offsets_size: trd.offsets_size as _, + data_address, + allocation <- kernel::new_spinlock!(Some(alloc.success()), "Tr= ansaction::new"), + is_outstanding: AtomicBool::new(false), + txn_security_ctx_off, + oneway_spam_detected, + start_time: Instant::now(), + }))?) + } + + pub(crate) fn new_reply( + from: &Arc, + to: Arc, + tr: &BinderTransactionDataSg, + allow_fds: bool, + ) -> BinderResult> { + let debug_id =3D super::next_debug_id(); + let trd =3D &tr.transaction_data; + let mut alloc =3D match from.copy_transaction_data(to.clone(), tr,= debug_id, allow_fds, None) + { + Ok(alloc) =3D> alloc, + Err(err) =3D> { + pr_warn!("Failure in copy_transaction_data: {:?}", err); + return Err(err); + } + }; + let oneway_spam_detected =3D alloc.oneway_spam_detected; + if trd.flags & TF_CLEAR_BUF !=3D 0 { + alloc.set_info_clear_on_drop(); + } + Ok(DTRWrap::arc_pin_init(pin_init!(Transaction { + debug_id, + target_node: None, + from_parent: None, + sender_euid: from.process.task.euid(), + from: from.clone(), + to, + code: trd.code, + flags: trd.flags, + data_size: trd.data_size as _, + offsets_size: trd.offsets_size as _, + data_address: alloc.ptr, + allocation <- kernel::new_spinlock!(Some(alloc.success()), "Tr= ansaction::new"), + is_outstanding: AtomicBool::new(false), + txn_security_ctx_off: None, + oneway_spam_detected, + start_time: Instant::now(), + }))?) + } + + #[inline(never)] + pub(crate) fn debug_print_inner(&self, m: &SeqFile, prefix: &str) { + seq_print!( + m, + "{}{}: from {}:{} to {} code {:x} flags {:x} elapsed {}ms", + prefix, + self.debug_id, + self.from.process.task.pid(), + self.from.id, + self.to.task.pid(), + self.code, + self.flags, + self.start_time.elapsed().as_millis(), + ); + if let Some(target_node) =3D &self.target_node { + seq_print!(m, " node {}", target_node.debug_id); + } + seq_print!(m, " size {}:{}\n", self.data_size, self.offsets_size); + } + + /// Determines if the transaction is stacked on top of the given trans= action. + pub(crate) fn is_stacked_on(&self, onext: &Option>) -> bool= { + match (&self.from_parent, onext) { + (None, None) =3D> true, + (Some(from_parent), Some(next)) =3D> Arc::ptr_eq(from_parent, = next), + _ =3D> false, + } + } + + /// Returns a pointer to the next transaction on the transaction stack= , if there is one. + pub(crate) fn clone_next(&self) -> Option> { + Some(self.from_parent.as_ref()?.clone()) + } + + /// Searches in the transaction stack for a thread that belongs to the= target process. This is + /// useful when finding a target for a new transaction: if the node be= longs to a process that + /// is already part of the transaction stack, we reuse the thread. + fn find_target_thread(&self) -> Option> { + let mut it =3D &self.from_parent; + while let Some(transaction) =3D it { + if Arc::ptr_eq(&transaction.from.process, &self.to) { + return Some(transaction.from.clone()); + } + it =3D &transaction.from_parent; + } + None + } + + /// Searches in the transaction stack for a transaction originating at= the given thread. + pub(crate) fn find_from(&self, thread: &Thread) -> Option<&DArc> { + let mut it =3D &self.from_parent; + while let Some(transaction) =3D it { + if core::ptr::eq(thread, transaction.from.as_ref()) { + return Some(transaction); + } + + it =3D &transaction.from_parent; + } + None + } + + pub(crate) fn set_outstanding(&self, to_process: &mut ProcessInner) { + // No race because this method is only called once. + if !self.is_outstanding.load(Ordering::Relaxed) { + self.is_outstanding.store(true, Ordering::Relaxed); + to_process.add_outstanding_txn(); + } + } + + /// Decrement `outstanding_txns` in `to` if it hasn't already been dec= remented. + fn drop_outstanding_txn(&self) { + // No race because this is called at most twice, and one of the ca= lls are in the + // destructor, which is guaranteed to not race with any other oper= ations on the + // transaction. It also cannot race with `set_outstanding`, since = submission happens + // before delivery. + if self.is_outstanding.load(Ordering::Relaxed) { + self.is_outstanding.store(false, Ordering::Relaxed); + self.to.drop_outstanding_txn(); + } + } + + /// Submits the transaction to a work queue. Uses a thread if there is= one in the transaction + /// stack, otherwise uses the destination process. + /// + /// Not used for replies. + pub(crate) fn submit(self: DLArc) -> BinderResult { + // Defined before `process_inner` so that the destructor runs afte= r releasing the lock. + let mut _t_outdated; + + let oneway =3D self.flags & TF_ONE_WAY !=3D 0; + let process =3D self.to.clone(); + let mut process_inner =3D process.inner.lock(); + + self.set_outstanding(&mut process_inner); + + if oneway { + if let Some(target_node) =3D self.target_node.clone() { + if process_inner.is_frozen { + process_inner.async_recv =3D true; + if self.flags & TF_UPDATE_TXN !=3D 0 { + if let Some(t_outdated) =3D + target_node.take_outdated_transaction(&self, &= mut process_inner) + { + // Save the transaction to be dropped after lo= cks are released. + _t_outdated =3D t_outdated; + } + } + } + match target_node.submit_oneway(self, &mut process_inner) { + Ok(()) =3D> {} + Err((err, work)) =3D> { + drop(process_inner); + // Drop work after releasing process lock. + drop(work); + return Err(err); + } + } + + if process_inner.is_frozen { + return Err(BinderError::new_frozen_oneway()); + } else { + return Ok(()); + } + } else { + pr_err!("Failed to submit oneway transaction to node."); + } + } + + if process_inner.is_frozen { + process_inner.sync_recv =3D true; + return Err(BinderError::new_frozen()); + } + + let res =3D if let Some(thread) =3D self.find_target_thread() { + match thread.push_work(self) { + PushWorkRes::Ok =3D> Ok(()), + PushWorkRes::FailedDead(me) =3D> Err((BinderError::new_dea= d(), me)), + } + } else { + process_inner.push_work(self) + }; + drop(process_inner); + + match res { + Ok(()) =3D> Ok(()), + Err((err, work)) =3D> { + // Drop work after releasing process lock. + drop(work); + Err(err) + } + } + } + + /// Check whether one oneway transaction can supersede another. + pub(crate) fn can_replace(&self, old: &Transaction) -> bool { + if self.from.process.task.pid() !=3D old.from.process.task.pid() { + return false; + } + + if self.flags & old.flags & (TF_ONE_WAY | TF_UPDATE_TXN) !=3D (TF_= ONE_WAY | TF_UPDATE_TXN) { + return false; + } + + let target_node_match =3D match (self.target_node.as_ref(), old.ta= rget_node.as_ref()) { + (None, None) =3D> true, + (Some(tn1), Some(tn2)) =3D> Arc::ptr_eq(tn1, tn2), + _ =3D> false, + }; + + self.code =3D=3D old.code && self.flags =3D=3D old.flags && target= _node_match + } + + fn prepare_file_list(&self) -> Result { + let mut alloc =3D self.allocation.lock().take().ok_or(ESRCH)?; + + match alloc.translate_fds() { + Ok(translated) =3D> { + *self.allocation.lock() =3D Some(alloc); + Ok(translated) + } + Err(err) =3D> { + // Free the allocation eagerly. + drop(alloc); + Err(err) + } + } + } +} + +impl DeliverToRead for Transaction { + fn do_work( + self: DArc, + thread: &Thread, + writer: &mut BinderReturnWriter<'_>, + ) -> Result { + let send_failed_reply =3D ScopeGuard::new(|| { + if self.target_node.is_some() && self.flags & TF_ONE_WAY =3D= =3D 0 { + let reply =3D Err(BR_FAILED_REPLY); + self.from.deliver_reply(reply, &self); + } + self.drop_outstanding_txn(); + }); + + let files =3D if let Ok(list) =3D self.prepare_file_list() { + list + } else { + // On failure to process the list, we send a reply back to the= sender and ignore the + // transaction on the recipient. + return Ok(true); + }; + + let mut tr_sec =3D BinderTransactionDataSecctx::default(); + let tr =3D tr_sec.tr_data(); + if let Some(target_node) =3D &self.target_node { + let (ptr, cookie) =3D target_node.get_id(); + tr.target.ptr =3D ptr as _; + tr.cookie =3D cookie as _; + }; + tr.code =3D self.code; + tr.flags =3D self.flags; + tr.data_size =3D self.data_size as _; + tr.data.ptr.buffer =3D self.data_address as _; + tr.offsets_size =3D self.offsets_size as _; + if tr.offsets_size > 0 { + tr.data.ptr.offsets =3D (self.data_address + ptr_align(self.da= ta_size).unwrap()) as _; + } + tr.sender_euid =3D self.sender_euid.into_uid_in_current_ns(); + tr.sender_pid =3D 0; + if self.target_node.is_some() && self.flags & TF_ONE_WAY =3D=3D 0 { + // Not a reply and not one-way. + tr.sender_pid =3D self.from.process.pid_in_current_ns(); + } + let code =3D if self.target_node.is_none() { + BR_REPLY + } else if self.txn_security_ctx_off.is_some() { + BR_TRANSACTION_SEC_CTX + } else { + BR_TRANSACTION + }; + + // Write the transaction code and data to the user buffer. + writer.write_code(code)?; + if let Some(off) =3D self.txn_security_ctx_off { + tr_sec.secctx =3D (self.data_address + off) as u64; + writer.write_payload(&tr_sec)?; + } else { + writer.write_payload(&*tr)?; + } + + let mut alloc =3D self.allocation.lock().take().ok_or(ESRCH)?; + + // Dismiss the completion of transaction with a failure. No failur= e paths are allowed from + // here on out. + send_failed_reply.dismiss(); + + // Commit files, and set FDs in FDA to be closed on buffer free. + let close_on_free =3D files.commit(); + alloc.set_info_close_on_free(close_on_free); + + // It is now the user's responsibility to clear the allocation. + alloc.keep_alive(); + + self.drop_outstanding_txn(); + + // When this is not a reply and not a oneway transaction, update `= current_transaction`. If + // it's a reply, `current_transaction` has already been updated ap= propriately. + if self.target_node.is_some() && tr_sec.transaction_data.flags & T= F_ONE_WAY =3D=3D 0 { + thread.set_current_transaction(self); + } + + Ok(false) + } + + fn cancel(self: DArc) { + let allocation =3D self.allocation.lock().take(); + drop(allocation); + + // If this is not a reply or oneway transaction, then send a dead = reply. + if self.target_node.is_some() && self.flags & TF_ONE_WAY =3D=3D 0 { + let reply =3D Err(BR_DEAD_REPLY); + self.from.deliver_reply(reply, &self); + } + + self.drop_outstanding_txn(); + } + + fn should_sync_wakeup(&self) -> bool { + self.flags & TF_ONE_WAY =3D=3D 0 + } + + fn debug_print(&self, m: &SeqFile, _prefix: &str, tprefix: &str) -> Re= sult<()> { + self.debug_print_inner(m, tprefix); + Ok(()) + } +} + +#[pinned_drop] +impl PinnedDrop for Transaction { + fn drop(self: Pin<&mut Self>) { + self.drop_outstanding_txn(); + } +} diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/andro= id/binder.h index 1fd92021a573aab833291f92e167152e36f9b69c..03ee4c7010d70bac5ac06b56907= 3c25b9971767e 100644 --- a/include/uapi/linux/android/binder.h +++ b/include/uapi/linux/android/binder.h @@ -38,7 +38,7 @@ enum { BINDER_TYPE_PTR =3D B_PACK_CHARS('p', 't', '*', B_TYPE_LARGE), }; =20 -enum { +enum flat_binder_object_flags { FLAT_BINDER_FLAG_PRIORITY_MASK =3D 0xff, FLAT_BINDER_FLAG_ACCEPTS_FDS =3D 0x100, =20 diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helpe= r.h index 84d60635e8a9baef1f1a1b2752dc0fa044f8542f..9b3a4ab95818c937d5f520c88e5= 6697d6efdf1d1 100644 --- a/rust/bindings/bindings_helper.h +++ b/rust/bindings/bindings_helper.h @@ -50,6 +50,7 @@ #include #include #include +#include #include #include #include @@ -71,6 +72,7 @@ #include #include #include +#include #include #include #include @@ -99,3 +101,9 @@ const xa_mark_t RUST_CONST_HELPER_XA_PRESENT =3D XA_PRES= ENT; =20 const gfp_t RUST_CONST_HELPER_XA_FLAGS_ALLOC =3D XA_FLAGS_ALLOC; const gfp_t RUST_CONST_HELPER_XA_FLAGS_ALLOC1 =3D XA_FLAGS_ALLOC1; + +#if IS_ENABLED(CONFIG_ANDROID_BINDER_IPC_RUST) +#include "../../drivers/android/binder/rust_binder.h" +#include "../../drivers/android/binder/rust_binder_events.h" +#include "../../drivers/android/binder/page_range_helper.h" +#endif diff --git a/rust/helpers/binder.c b/rust/helpers/binder.c new file mode 100644 index 0000000000000000000000000000000000000000..224d38a92f1d985d78767d5a72f= 5ff60765b8508 --- /dev/null +++ b/rust/helpers/binder.c @@ -0,0 +1,26 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * Copyright (C) 2025 Google LLC. + */ + +#include +#include + +unsigned long rust_helper_list_lru_count(struct list_lru *lru) +{ + return list_lru_count(lru); +} + +unsigned long rust_helper_list_lru_walk(struct list_lru *lru, + list_lru_walk_cb isolate, void *cb_arg, + unsigned long nr_to_walk) +{ + return list_lru_walk(lru, isolate, cb_arg, nr_to_walk); +} + +void rust_helper_init_task_work(struct callback_head *twork, + task_work_func_t func) +{ + init_task_work(twork, func); +} diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c index 7cf7fe95e41dd51717050648d6160bebebdf4b26..8e8277bdddcaeec1edebe18ffc4= fe831c08a8455 100644 --- a/rust/helpers/helpers.c +++ b/rust/helpers/helpers.c @@ -8,6 +8,7 @@ */ =20 #include "auxiliary.c" +#include "binder.c" #include "blk.c" #include "bug.c" #include "build_assert.c" diff --git a/rust/helpers/page.c b/rust/helpers/page.c index b3f2b8fbf87fc9aa89cb1636736c52be16411301..7144de5a61dbdb3006a668961cd= 1b09440e74908 100644 --- a/rust/helpers/page.c +++ b/rust/helpers/page.c @@ -2,6 +2,7 @@ =20 #include #include +#include =20 struct page *rust_helper_alloc_pages(gfp_t gfp_mask, unsigned int order) { @@ -17,3 +18,10 @@ void rust_helper_kunmap_local(const void *addr) { kunmap_local(addr); } + +#ifndef NODE_NOT_IN_PAGE_FLAGS +int rust_helper_page_to_nid(const struct page *page) +{ + return page_to_nid(page); +} +#endif diff --git a/rust/helpers/security.c b/rust/helpers/security.c index 0c4c2065df28e7c6dc8982c6df44d47c57bf29e6..ca22da09548dfed95a83168ed09= 263e75cf08fd2 100644 --- a/rust/helpers/security.c +++ b/rust/helpers/security.c @@ -17,4 +17,28 @@ void rust_helper_security_release_secctx(struct lsm_cont= ext *cp) { security_release_secctx(cp); } + +int rust_helper_security_binder_set_context_mgr(const struct cred *mgr) +{ + return security_binder_set_context_mgr(mgr); +} + +int rust_helper_security_binder_transaction(const struct cred *from, + const struct cred *to) +{ + return security_binder_transaction(from, to); +} + +int rust_helper_security_binder_transfer_binder(const struct cred *from, + const struct cred *to) +{ + return security_binder_transfer_binder(from, to); +} + +int rust_helper_security_binder_transfer_file(const struct cred *from, + const struct cred *to, + const struct file *file) +{ + return security_binder_transfer_file(from, to, file); +} #endif diff --git a/rust/kernel/cred.rs b/rust/kernel/cred.rs index 2599f01e8b285f2106aefd27c315ae2aff25293c..3aa2e4c4a50c99864106d93d573= 498b0202f024e 100644 --- a/rust/kernel/cred.rs +++ b/rust/kernel/cred.rs @@ -54,6 +54,12 @@ pub unsafe fn from_ptr<'a>(ptr: *const bindings::cred) -= > &'a Credential { unsafe { &*ptr.cast() } } =20 + /// Returns a raw pointer to the inner credential. + #[inline] + pub fn as_ptr(&self) -> *const bindings::cred { + self.0.get() + } + /// Get the id for this security context. #[inline] pub fn get_secid(&self) -> u32 { diff --git a/rust/kernel/page.rs b/rust/kernel/page.rs index 7c1b17246ed5e88cb122c6aa594d1d4b86b8349b..811fe30e8e6ff1bd432e7929025= 6ee0b950320e2 100644 --- a/rust/kernel/page.rs +++ b/rust/kernel/page.rs @@ -85,6 +85,12 @@ pub fn as_ptr(&self) -> *mut bindings::page { self.page.as_ptr() } =20 + /// Get the node id containing this page. + pub fn nid(&self) -> i32 { + // SAFETY: Always safe to call with a valid page. + unsafe { bindings::page_to_nid(self.as_ptr()) } + } + /// Runs a piece of code with this page mapped to an address. /// /// The page is unmapped when this call returns. diff --git a/rust/kernel/security.rs b/rust/kernel/security.rs index 0c63e9e7e564b7d9d85865e5415dd0464e9a9098..9d271695265fb4635038e9e36c9= 75cebb38d6782 100644 --- a/rust/kernel/security.rs +++ b/rust/kernel/security.rs @@ -8,9 +8,46 @@ =20 use crate::{ bindings, + cred::Credential, error::{to_result, Result}, + fs::File, }; =20 +/// Calls the security modules to determine if the given task can become t= he manager of a binder +/// context. +#[inline] +pub fn binder_set_context_mgr(mgr: &Credential) -> Result { + // SAFETY: `mrg.0` is valid because the shared reference guarantees a = nonzero refcount. + to_result(unsafe { bindings::security_binder_set_context_mgr(mgr.as_pt= r()) }) +} + +/// Calls the security modules to determine if binder transactions are all= owed from task `from` to +/// task `to`. +#[inline] +pub fn binder_transaction(from: &Credential, to: &Credential) -> Result { + // SAFETY: `from` and `to` are valid because the shared references gua= rantee nonzero refcounts. + to_result(unsafe { bindings::security_binder_transaction(from.as_ptr()= , to.as_ptr()) }) +} + +/// Calls the security modules to determine if task `from` is allowed to s= end binder objects +/// (owned by itself or other processes) to task `to` through a binder tra= nsaction. +#[inline] +pub fn binder_transfer_binder(from: &Credential, to: &Credential) -> Resul= t { + // SAFETY: `from` and `to` are valid because the shared references gua= rantee nonzero refcounts. + to_result(unsafe { bindings::security_binder_transfer_binder(from.as_p= tr(), to.as_ptr()) }) +} + +/// Calls the security modules to determine if task `from` is allowed to s= end the given file to +/// task `to` (which would get its own file descriptor) through a binder t= ransaction. +#[inline] +pub fn binder_transfer_file(from: &Credential, to: &Credential, file: &Fil= e) -> Result { + // SAFETY: `from`, `to` and `file` are valid because the shared refere= nces guarantee nonzero + // refcounts. + to_result(unsafe { + bindings::security_binder_transfer_file(from.as_ptr(), to.as_ptr()= , file.as_ptr()) + }) +} + /// A security context string. /// /// # Invariants diff --git a/rust/uapi/uapi_helper.h b/rust/uapi/uapi_helper.h index 1409441359f510236256bc17851f9aac65c45c4e..de3562b08d0c3e379a4bfd180c3= a4260c48c71b2 100644 --- a/rust/uapi/uapi_helper.h +++ b/rust/uapi/uapi_helper.h @@ -9,6 +9,7 @@ #include #include #include +#include #include #include #include --- base-commit: 9441d6b876529d519547a1ed3af5a08b05bd0339 change-id: 20250918-rust-binder-d80251ab47cb Best regards, --=20 Alice Ryhl