[PATCH v1] add binder genl for txn report

Li Li posted 1 patch 1 year, 5 months ago
drivers/android/Makefile            |   2 +-
drivers/android/binder.c            | 125 +++++++++++++++++-
drivers/android/binder_genl.c       | 197 ++++++++++++++++++++++++++++
drivers/android/binder_genl.h       |  17 +++
drivers/android/binder_internal.h   |  13 ++
drivers/android/binder_trace.h      |  37 ++++++
include/uapi/linux/android/binder.h | 147 +++++++++++++++++++++
7 files changed, 534 insertions(+), 4 deletions(-)
create mode 100644 drivers/android/binder_genl.c
create mode 100644 drivers/android/binder_genl.h
[PATCH v1] add binder genl for txn report
Posted by Li Li 1 year, 5 months ago
From: Li Li <dualli@google.com>

Frozen tasks can't process binder transactions, so sync binder
transactions will fail with BR_FROZEN_REPLY and async binder
transactions will be queued in the kernel async binder buffer.
As these queued async transactions accumulates over time, the async
buffer will eventually be running out, denying all new transactions
after that with BR_FAILED_REPLY.

In addition to the above cases, different kinds of binder error codes
might be returned to the sender. However, the core Linux, or Android,
system administration process never knows what's actually happening.

This patch introduces the Linux generic netlink messages into the binder
driver so that the Linux/Android system administration process can
listen to important events and take corresponding actions, like stopping
a broken app from attacking the OS by sending huge amount of spamming
binder transactions.

To prevent making the already bloated binder.c even bigger, a new source
file binder_genl.c is created to host those generic netlink code.

Usage example (user space pseudo code):

	// enable binder report from the interested binder context(s)
	struct binder_report_info info = {0, BINDER_REPORT_ALL};
	ioctl(binder1, BINDER_ENABLE_REPORT, &info);
	ioctl(binder2, BINDER_ENABLE_REPORT, &info);

	// set optional per-process report, overriding the global one
	struct binder_report_info info2;
	info2.pid = getpid();
	info2.flags = BINDER_REPORT_FAILED | BINDER_REPORT_OVERRIDE;
	ioctl(binder2, BINDER_ENABLE_REPORT, &info2);

	// open netlink socket
	int fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);

	// bind netlink socket
	bind(fd, struct socketaddr);

	// get the family id of binder generic netlink
	send(fd, CTRL_CMD_GETFAMILY, CTRL_ATTR_FAMILY_NAME);
	int id = recv(CTRL_ATTR_FAMILY_ID);

	// register the current process to receive binder reports
	send(fd, id, BINDER_GENL_CMD_REGISTER);

	// verify registration by reading back the registered pid
	recv(fd, BINDER_GENL_ATTR_PID, &pid);

	// wait and read all binder reports
	while (running) {
		struct binder_report report;
		recv(fd, BINDER_GENL_ATTR_REPORT, &report);

		// process struct binder_report
		do_something(&report);
	}

	// clean up
	close(fd);

Signed-off-by: Li Li <dualli@google.com>
---
 drivers/android/Makefile            |   2 +-
 drivers/android/binder.c            | 125 +++++++++++++++++-
 drivers/android/binder_genl.c       | 197 ++++++++++++++++++++++++++++
 drivers/android/binder_genl.h       |  17 +++
 drivers/android/binder_internal.h   |  13 ++
 drivers/android/binder_trace.h      |  37 ++++++
 include/uapi/linux/android/binder.h | 147 +++++++++++++++++++++
 7 files changed, 534 insertions(+), 4 deletions(-)
 create mode 100644 drivers/android/binder_genl.c
 create mode 100644 drivers/android/binder_genl.h

diff --git a/drivers/android/Makefile b/drivers/android/Makefile
index c9d3d0c99c25..d818447fbc4c 100644
--- a/drivers/android/Makefile
+++ b/drivers/android/Makefile
@@ -2,5 +2,5 @@
 ccflags-y += -I$(src)			# needed for trace events
 
 obj-$(CONFIG_ANDROID_BINDERFS)		+= binderfs.o
-obj-$(CONFIG_ANDROID_BINDER_IPC)	+= binder.o binder_alloc.o
+obj-$(CONFIG_ANDROID_BINDER_IPC)	+= binder.o binder_alloc.o binder_genl.o
 obj-$(CONFIG_ANDROID_BINDER_IPC_SELFTEST) += binder_alloc_selftest.o
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 905290c98c3c..4cd8301a28cb 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -72,6 +72,7 @@
 
 #include <linux/cacheflush.h>
 
+#include "binder_genl.h"
 #include "binder_internal.h"
 #include "binder_trace.h"
 
@@ -3676,10 +3677,27 @@ static void binder_transaction(struct binder_proc *proc,
 		return_error_line = __LINE__;
 		goto err_copy_data_failed;
 	}
-	if (t->buffer->oneway_spam_suspect)
+	if (t->buffer->oneway_spam_suspect) {
 		tcomplete->type = BINDER_WORK_TRANSACTION_ONEWAY_SPAM_SUSPECT;
-	else
+		if (binder_genl_report_enabled(proc, BINDER_REPORT_SPAM)) {
+			struct binder_report report;
+
+			strscpy(report.name, context->name,
+				sizeof(report.name));
+			report.err = BR_ONEWAY_SPAM_SUSPECT;
+			report.from_pid = proc->pid;
+			report.from_tid = thread->pid;
+			report.to_pid = target_proc ? target_proc->pid : 0;
+			report.to_tid = target_thread ? target_thread->pid : 0;
+			report.reply = reply;
+			report.flags = tr->flags;
+			report.code = tr->code;
+			report.data_size = tr->data_size;
+			binder_genl_send_report(&report, sizeof(report));
+		}
+	} else {
 		tcomplete->type = BINDER_WORK_TRANSACTION_COMPLETE;
+	}
 	t->work.type = BINDER_WORK_TRANSACTION;
 
 	if (reply) {
@@ -3729,8 +3747,25 @@ static void binder_transaction(struct binder_proc *proc,
 		 * process and is put in a pending queue, waiting for the target
 		 * process to be unfrozen.
 		 */
-		if (return_error == BR_TRANSACTION_PENDING_FROZEN)
+		if (return_error == BR_TRANSACTION_PENDING_FROZEN) {
 			tcomplete->type = BINDER_WORK_TRANSACTION_PENDING;
+			if (binder_genl_report_enabled(proc, BINDER_REPORT_DELAYED)) {
+				struct binder_report report;
+
+				strscpy(report.name, context->name,
+					sizeof(report.name));
+				report.err = return_error;
+				report.from_pid = proc->pid;
+				report.from_tid = thread->pid;
+				report.to_pid = target_proc ? target_proc->pid : 0;
+				report.to_tid = target_thread ? target_thread->pid : 0;
+				report.reply = reply;
+				report.flags = tr->flags;
+				report.code = tr->code;
+				report.data_size = tr->data_size;
+				binder_genl_send_report(&report, sizeof(report));
+			}
+		}
 		binder_enqueue_thread_work(thread, tcomplete);
 		if (return_error &&
 		    return_error != BR_TRANSACTION_PENDING_FROZEN)
@@ -3792,6 +3827,22 @@ static void binder_transaction(struct binder_proc *proc,
 		binder_dec_node_tmpref(target_node);
 	}
 
+	if (binder_genl_report_enabled(proc, BINDER_REPORT_FAILED)) {
+		struct binder_report report;
+
+		strscpy(report.name, context->name, sizeof(report.name));
+		report.err = return_error;
+		report.from_pid = proc->pid;
+		report.from_tid = thread->pid;
+		report.to_pid = target_proc ? target_proc->pid : 0;
+		report.to_tid = target_thread ? target_thread->pid : 0;
+		report.reply = reply;
+		report.flags = tr->flags;
+		report.code = tr->code;
+		report.data_size = tr->data_size;
+		binder_genl_send_report(&report, sizeof(report));
+	}
+
 	binder_debug(BINDER_DEBUG_FAILED_TRANSACTION,
 		     "%d:%d transaction %s to %d:%d failed %d/%d/%d, size %lld-%lld line %d\n",
 		     proc->pid, thread->pid, reply ? "reply" :
@@ -5415,6 +5466,59 @@ static int binder_ioctl_get_extended_error(struct binder_thread *thread,
 	return 0;
 }
 
+/**
+ * binder_ioctl_enable_report() - set binder report flags
+ * @proc:	the binder_proc calling the ioctl
+ * @info:	contains the target pid and the flags to set
+ *
+ * If pid is 0, the flags are applied to the whole binder context.
+ * Otherwise, the flags are applied to the specific process only.
+ */
+static int binder_ioctl_enable_report(struct binder_proc *proc,
+				      struct binder_report_info *info)
+{
+	int pid;
+	u32 flags;
+	bool found;
+	struct binder_context *context;
+	struct binder_proc *target_proc;
+
+	pid = info->pid;
+	flags = info->flags;
+	context = proc->context;
+
+	if (flags != (flags & (BINDER_REPORT_ALL | BINDER_REPORT_OVERRIDE))) {
+		pr_err("Invalid binder report flags: %u\n", flags);
+		return -EINVAL;
+	}
+
+	if (!pid) {
+		/* Set the global flags for the whole binder context */
+
+		pr_info("Set global binder report flags %u for %s\n",
+			flags, context->name);
+		context->report_flags = flags;
+	} else {
+		/* Set the per-process flags */
+
+		found = false;
+		mutex_lock(&binder_procs_lock);
+		hlist_for_each_entry(target_proc, &binder_procs, proc_node) {
+			if (target_proc->pid == pid) {
+				found = true;
+				pr_info("Set binder report flags %u for %d\n",
+					flags, pid);
+				target_proc->report_flags = flags;
+			}
+		}
+		mutex_unlock(&binder_procs_lock);
+		if (!found)
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
 static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 {
 	int ret;
@@ -5623,6 +5727,18 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 		if (ret < 0)
 			goto err;
 		break;
+	case BINDER_ENABLE_REPORT: {
+		struct binder_report_info info;
+
+		if (copy_from_user(&info, ubuf, sizeof(info))) {
+			ret = -EFAULT;
+			goto err;
+		}
+		ret = binder_ioctl_enable_report(proc, &info);
+		if (ret)
+			goto err;
+		break;
+	}
 	default:
 		ret = -EINVAL;
 		goto err;
@@ -6652,6 +6768,9 @@ static int __init binder_init(void)
 	const struct binder_debugfs_entry *db_entry;
 
 	ret = binder_alloc_shrinker_init();
+	if (ret)
+		return ret;
+	ret = binder_genl_init();
 	if (ret)
 		return ret;
 
diff --git a/drivers/android/binder_genl.c b/drivers/android/binder_genl.c
new file mode 100644
index 000000000000..4ed8f7d0492b
--- /dev/null
+++ b/drivers/android/binder_genl.c
@@ -0,0 +1,197 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* binder_genl.c
+ *
+ * Android IPC Subsystem
+ *
+ * Copyright (C) 2024 Google, Inc.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/skbuff.h>
+#include <net/sock.h>
+#include <net/genetlink.h>
+#include <uapi/linux/android/binder.h>
+
+#include "binder_genl.h"
+#include "binder_trace.h"
+
+/**
+ * The registered process that would receive binder reports.
+ */
+static u32 binder_report_pid;
+
+/**
+ * The policy to verify the type of the binder genl data
+ */
+static const struct nla_policy binder_report_policy[BINDER_GENL_ATTR_MAX + 1] = {
+	[BINDER_GENL_ATTR_PID] = { .type = NLA_U32 },
+};
+
+static struct genl_family binder_gnl_family;
+
+/**
+ * binder_genl_cmd_register() - register the pid to receive binder genl reports
+ * @info:	the generic netlink struct passed from netlink driver
+ * @pid:	the process to receive binder genl reports
+ *
+ * Called by .doit to register the pid, which is then replied to the caller to
+ * complete the handshake.
+ */
+static int binder_genl_cmd_register(struct genl_info *info, u32 pid)
+{
+	int len;
+	struct sk_buff *skb;
+	void *hdr;
+
+	len = sizeof(pid);
+	skb = genlmsg_new(len, GFP_KERNEL);
+	if (!skb) {
+		pr_err("Failed to alloc binder genl message\n");
+		return -ENOMEM;
+	}
+
+	hdr = genlmsg_put(skb, pid, 0, &binder_gnl_family, 0,
+			  BINDER_GENL_CMD_REPLY);
+	if (!hdr) {
+		pr_err("Failed to set binder genl header\n");
+		kfree_skb(skb);
+		return -EMSGSIZE;
+	}
+
+	if (nla_put(skb, BINDER_GENL_ATTR_PID, len, &pid)) {
+		genlmsg_cancel(skb, hdr);
+		nlmsg_free(skb);
+		return -EMSGSIZE;
+	}
+
+	genlmsg_end(skb, hdr);
+
+	if (genlmsg_reply(skb, info)) {
+		pr_err("Failed to send binder genl message\n");
+		return -EFAULT;
+	}
+
+	binder_report_pid = pid;
+
+	return 0;
+}
+
+/**
+ * binder_genl_cmd_doit() - the .doit handler of binder genl commands
+ * @skb:	the metadata struct passed from netlink driver
+ * @info:	the generic netlink struct passed from netlink driver
+ *
+ * Implements the .doit function to process binder genl commands.
+ */
+static int binder_genl_cmd_doit(struct sk_buff *skb, struct genl_info *info)
+{
+	int pid;
+
+	pid = nlmsg_hdr(skb)->nlmsg_pid;
+
+	/* The only cmd is to register the process to receive binder reports */
+	return binder_genl_cmd_register(info, pid);
+}
+
+/**
+ * binder_genl_ops - the small version of generic netlink operations
+ *
+ * Supports only 1 command right now for registration handshake.
+ */
+static struct genl_small_ops binder_genl_ops[] = {
+	{
+		.cmd = BINDER_GENL_CMD_REGISTER,
+		.doit = binder_genl_cmd_doit,
+	}
+};
+
+/**
+ * binder_gnl_family - the binder generic netlink family
+ *
+ * Defines policies and supported operations of binder genl
+ */
+static struct genl_family binder_gnl_family = {
+	.name = BINDER_GENL_FAMILY_NAME,
+	.version = BINDER_GENL_VERSION,
+	.maxattr = BINDER_GENL_ATTR_MAX,
+	.policy	= binder_report_policy,
+	.small_ops = binder_genl_ops,
+	.n_small_ops = ARRAY_SIZE(binder_genl_ops),
+};
+
+/**
+ * binder_genl_init() - initialize binder generic netlink
+ *
+ * Registers the binder generic netlink family.
+ */
+int __init binder_genl_init(void)
+{
+	int ret = genl_register_family(&binder_gnl_family);
+
+	if (ret) {
+		pr_err("Failed to register binder genl\n");
+		return ret;
+	}
+
+	return 0;
+}
+
+/**
+ * binder_genl_report_enabled() - check if binder genl reports are enabled
+ * @proc:	the binder_proc to check
+ * @mask:	the categories of binder genl reports
+ *
+ * Returns true if certain binder genl reports are enabled for this binder
+ * proc (when per-process overriding takes effect) or context.
+ */
+inline bool binder_genl_report_enabled(struct binder_proc *proc, u32 mask)
+{
+	struct binder_context *context = proc->context;
+
+	if (proc->report_flags & BINDER_REPORT_OVERRIDE)
+		return (proc->report_flags & mask) != 0;
+	else
+		return (context->report_flags & mask) != 0;
+}
+
+/**
+ * binder_genl_send_report() - send one binder genl report
+ * @report:	the binder genl report to send
+ * @len:	the length of the report data
+ *
+ * Packs the report data into a BINDER_GENL_ATTR_REPORT packet and sends them
+ * to the previously registered process.
+ */
+void binder_genl_send_report(struct binder_report *report, int len)
+{
+	struct sk_buff *skb;
+	void *hdr;
+
+	trace_binder_send_report(report, len);
+
+	skb = genlmsg_new(len, GFP_KERNEL);
+	if (!skb) {
+		pr_err("Failed to alloc binder genl message\n");
+		return;
+	}
+
+	hdr = genlmsg_put(skb, binder_report_pid, 0, &binder_gnl_family, 0,
+			  BINDER_GENL_CMD_REPORT);
+	if (!hdr) {
+		pr_err("Failed to set binder genl header\n");
+		kfree_skb(skb);
+		return;
+	}
+
+	if (nla_put(skb, BINDER_GENL_ATTR_REPORT, len, report)) {
+		genlmsg_cancel(skb, hdr);
+		nlmsg_free(skb);
+		return;
+	}
+
+	genlmsg_end(skb, hdr);
+
+	if (genlmsg_unicast(&init_net, skb, binder_report_pid))
+		pr_err("Failed to send binder genl message\n");
+}
diff --git a/drivers/android/binder_genl.h b/drivers/android/binder_genl.h
new file mode 100644
index 000000000000..b017c2786860
--- /dev/null
+++ b/drivers/android/binder_genl.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2024 Google, Inc.
+ */
+
+#ifndef _LINUX_BINDER_GENL_H
+#define _LINUX_BINDER_GENL_H
+
+#include "binder_internal.h"
+
+int binder_genl_init(void);
+
+bool binder_genl_report_enabled(struct binder_proc *proc, u32 mask);
+
+void binder_genl_send_report(struct binder_report *report, int len);
+
+#endif /* _LINUX_BINDER_GENL_H */
diff --git a/drivers/android/binder_internal.h b/drivers/android/binder_internal.h
index 7d4fc53f7a73..34cd2cf1b345 100644
--- a/drivers/android/binder_internal.h
+++ b/drivers/android/binder_internal.h
@@ -16,11 +16,21 @@
 #include "binder_alloc.h"
 #include "dbitmap.h"
 
+/**
+ * struct binder_context - information about a binder domain
+ * @binder_context_mgr_node: the context manager
+ * @context_mgr_node_lock:   the lock protecting the above context manager node
+ * @binder_context_mgr_uid:  the uid of the above context manager
+ * @name:                    the name of the binder device
+ * @report_flags:            the categories of binder transactions that would
+ *                           be reported (see enum binder_report_flag).
+ */
 struct binder_context {
 	struct binder_node *binder_context_mgr_node;
 	struct mutex context_mgr_node_lock;
 	kuid_t binder_context_mgr_uid;
 	const char *name;
+	u32 report_flags;
 };
 
 /**
@@ -399,6 +409,8 @@ struct binder_ref {
  * @binderfs_entry:       process-specific binderfs log file
  * @oneway_spam_detection_enabled: process enabled oneway spam detection
  *                        or not
+ * @report_flags:         the categories of binder transactions that would
+ *                        be reported (see enum binder_report_flag).
  *
  * Bookkeeping structure for binder processes
  */
@@ -436,6 +448,7 @@ struct binder_proc {
 	spinlock_t outer_lock;
 	struct dentry *binderfs_entry;
 	bool oneway_spam_detection_enabled;
+	u32 report_flags;
 };
 
 /**
diff --git a/drivers/android/binder_trace.h b/drivers/android/binder_trace.h
index fe38c6fc65d0..4c69c5f99e5c 100644
--- a/drivers/android/binder_trace.h
+++ b/drivers/android/binder_trace.h
@@ -423,6 +423,43 @@ TRACE_EVENT(binder_return,
 			  "unknown")
 );
 
+TRACE_EVENT(binder_send_report,
+	TP_PROTO(struct binder_report *report, int len),
+	TP_ARGS(report, len),
+	TP_STRUCT__entry(
+		__field(const char *, name)
+		__field(uint32_t, err)
+		__field(uint32_t, from_pid)
+		__field(uint32_t, from_tid)
+		__field(uint32_t, to_pid)
+		__field(uint32_t, to_tid)
+		__field(uint32_t, reply)
+		__field(uint32_t, flags)
+		__field(uint32_t, code)
+		__field(binder_size_t, data_size)
+		__field(uint32_t, len)
+	),
+	TP_fast_assign(
+		__entry->name = report->name;
+		__entry->err = report->err;
+		__entry->from_pid = report->from_pid;
+		__entry->from_tid = report->from_tid;
+		__entry->to_pid = report->to_pid;
+		__entry->to_tid = report->to_tid;
+		__entry->reply = report->reply;
+		__entry->flags = report->flags;
+		__entry->code = report->code;
+		__entry->data_size = report->data_size;
+		__entry->len = len;
+	),
+	TP_printk("%s: %d %d:%d -> %d:%d %s flags=0x08%x code=%d %llu %d",
+		  __entry->name, __entry->err, __entry->from_pid,
+		  __entry->from_tid, __entry->to_pid, __entry->to_tid,
+		  __entry->reply ? "reply" : "",
+		  __entry->flags, __entry->code, __entry->data_size,
+		  __entry->len)
+);
+
 #endif /* _BINDER_TRACE_H */
 
 #undef TRACE_INCLUDE_PATH
diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
index d44a8118b2ed..893192e5cc55 100644
--- a/include/uapi/linux/android/binder.h
+++ b/include/uapi/linux/android/binder.h
@@ -266,6 +266,7 @@ enum {
 	BINDER_GET_FROZEN_INFO		= _IOWR('b', 15, struct binder_frozen_status_info),
 	BINDER_ENABLE_ONEWAY_SPAM_DETECTION	= _IOW('b', 16, __u32),
 	BINDER_GET_EXTENDED_ERROR	= _IOWR('b', 17, struct binder_extended_error),
+	BINDER_ENABLE_REPORT		= _IOW('b', 18, __u32),
 };
 
 /*
@@ -552,5 +553,151 @@ enum binder_driver_command_protocol {
 	 */
 };
 
+/*
+ * Name of binder generic netlink
+ */
+#define BINDER_GENL_FAMILY_NAME	"binder"
+
+/*
+ * Version of binder generic netlink
+ */
+#define BINDER_GENL_VERSION	1
+
+/*
+ * Used with BINDER_ENABLE_REPORT, defining what kind of binder transactions
+ * should be reported to user space administration process.
+ */
+enum binder_report_flag {
+	/*
+	 * Report failed binder transactions,
+	 * when the sender gets BR_{FAILED|FROZEN|DEAD}_REPLY
+	 */
+	BINDER_REPORT_FAILED = 0x1,
+
+	/*
+	 * Report delayed async binder transactions due to frozen target,
+	 * when the sender gets BR_TRANSACTION_PENDING_FROZEN.
+	 */
+	BINDER_REPORT_DELAYED = 0x2,
+
+	/*
+	 * Report spamming binder transactions,
+	 * when the sender gets BR_ONEWAY_SPAM_SUSPECT.
+	 */
+	BINDER_REPORT_SPAM = 0x4,
+
+	/*
+	 * Report all of the above
+	 */
+	BINDER_REPORT_ALL = BINDER_REPORT_FAILED
+			| BINDER_REPORT_DELAYED
+			| BINDER_REPORT_SPAM,
+
+	/*
+	 * Enable the per-process flag, which overrides the global one.
+	 */
+	BINDER_REPORT_OVERRIDE = 0x8000000,
+};
+
+/* struct binder_report_info - binder report information
+ * @pid:	the process to apply the flags
+ * @flags:	the enabled binder reports (see enum binder_report_flag)
+ *
+ * Used with BINDER_ENABLE_REPORT. If pid is 0, the flags are applied to the
+ * whole binder context. Otherwise, the flags are applied to the specific
+ * process only.
+ */
+struct binder_report_info {
+	__u32 pid;
+	__u32 flags;
+};
+
+/*
+ * struct binder_report - reports an abnormal binder transaction
+ * @name:	copy of binder_context->name where the txn happens
+ * @err:	copy of binder_driver_return_protocol returned to the sender
+ * @from_pid:	sender pid of the corresponding binder transaction
+ * @from_tid:	sender tid of the corresponding binder transaction
+ * @to_pid:	target pid of the corresponding binder transaction
+ * @to_tid:	target tid of the corresponding binder transaction
+ * @reply:	1 means the txn is a reply, 0 otherwise
+ * @flags:	copy of binder_transaction_data->flags
+ * @code:	copy of binder_transaction_data->code
+ * @data_size:	copy of binder_transaction_data->data_size
+ *
+ * When a binder transaction fails to reach the target process or is not
+ * delivered on time, an error command BR_XXX is returned from the kernel
+ * binder driver to the user space sender. This is considered an abnormal
+ * binder transaction. The most important information about this abnormal
+ * binder transaction will be packed into this binder_report struct and sent
+ * to the registered user space administration process via generic netlink.
+ */
+struct binder_report {
+	__u8 name[16];
+	__u32 err;
+	__u32 from_pid;
+	__u32 from_tid;
+	__u32 to_pid;
+	__u32 to_tid;
+	__u32 reply;
+	__u32 flags;
+	__u32 code;
+	binder_size_t data_size;
+};
+
+/*
+ * The supported attributes of binder generic netlink.
+ */
+enum binder_genl_attr {
+	BINDER_GENL_ATTR_UNSPEC = 0,
+
+	/*
+	 * The attribute for BINDER_GENL_CMD_REPLY to reply to
+	 * BINDER_GENL_CMD_REGISTER.
+	 * Contains payload binder_report_pid.
+	 */
+	BINDER_GENL_ATTR_PID  = 1,
+
+	/*
+	 * The attribute for BINDER_GENL_CMD_REPORT.
+	 * Contains payload binder_report.
+	 */
+	BINDER_GENL_ATTR_REPORT = 2,
+
+	__BINDER_GENL_ATTR_MAX  = 3,
+};
+
+#define BINDER_GENL_ATTR_MAX (__BINDER_GENL_ATTR_MAX - 1)
+
+/*
+ * The supported commands of binder generic netlink.
+ */
+enum binder_genl_cmd {
+	BINDER_GENL_CMD_UNSPEC = 0,
+
+	/*
+	 * The user space administrator process uses this command to register
+	 * itself to receive binder reports via generic netlink.
+	 */
+	BINDER_GENL_CMD_REGISTER = 1,
+
+	/*
+	 * After receiving BINDER_GENL_CMD_REGISTER from the user space
+	 * administrator process, the kernel binder driver uses this command
+	 * to acknowledge the request.
+	 */
+	BINDER_GENL_CMD_REPLY  = 2,
+
+	/*
+	 * After enabling binder report, the kernel binder driver uses this
+	 * command to send the requested reports to the user space.
+	 */
+	BINDER_GENL_CMD_REPORT = 3,
+
+	__BINDER_GENL_CMD_MAX  = 4,
+};
+
+#define BINDER_GENL_CMD_MAX (__BINDER_GENL_CMD_MAX - 1)
+
 #endif /* _UAPI_LINUX_BINDER_H */
 
-- 
2.46.0.76.ge559c4bf1a-goog
Re: [PATCH v1] add binder genl for txn report
Posted by kernel test robot 1 year, 5 months ago
Hi Li,

kernel test robot noticed the following build errors:

[auto build test ERROR on staging/staging-testing]
[also build test ERROR on staging/staging-next staging/staging-linus linus/master v6.11-rc3 next-20240814]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Li-Li/add-binder-genl-for-txn-report/20240814-150338
base:   staging/staging-testing
patch link:    https://lore.kernel.org/r/20240812211844.4107494-1-dualli%40chromium.org
patch subject: [PATCH v1] add binder genl for txn report
config: x86_64-buildonly-randconfig-005-20240814 (https://download.01.org/0day-ci/archive/20240815/202408150142.elc854F6-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240815/202408150142.elc854F6-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408150142.elc854F6-lkp@intel.com/

All errors (new ones prefixed by >>):

   ld: vmlinux.o: in function `binder_genl_cmd_doit':
   binder_genl.c:(.text+0x1284e00): undefined reference to `__alloc_skb'
   ld: binder_genl.c:(.text+0x1284e2f): undefined reference to `genlmsg_put'
   ld: binder_genl.c:(.text+0x1284e5c): undefined reference to `nla_put'
   ld: binder_genl.c:(.text+0x1284e94): undefined reference to `init_net'
   ld: binder_genl.c:(.text+0x1284e9d): undefined reference to `netlink_unicast'
   ld: binder_genl.c:(.text+0x1284f04): undefined reference to `skb_trim'
   ld: binder_genl.c:(.text+0x1284f18): undefined reference to `sk_skb_reason_drop'
   ld: binder_genl.c:(.text+0x1284f72): undefined reference to `sk_skb_reason_drop'
   ld: vmlinux.o: in function `binder_genl_send_report':
>> (.text+0x12850d9): undefined reference to `__alloc_skb'
>> ld: (.text+0x128510a): undefined reference to `genlmsg_put'
>> ld: (.text+0x1285131): undefined reference to `nla_put'
>> ld: (.text+0x128516e): undefined reference to `init_net'
>> ld: (.text+0x1285173): undefined reference to `netlink_unicast'
>> ld: (.text+0x12851c2): undefined reference to `skb_trim'
>> ld: (.text+0x12851d6): undefined reference to `sk_skb_reason_drop'
   ld: (.text+0x128527a): undefined reference to `sk_skb_reason_drop'
   ld: vmlinux.o: in function `binder_genl_init':
>> (.init.text+0xa5bd7): undefined reference to `genl_register_family'

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [PATCH v1] add binder genl for txn report
Posted by kernel test robot 1 year, 5 months ago
Hi Li,

kernel test robot noticed the following build errors:

[auto build test ERROR on staging/staging-testing]
[also build test ERROR on staging/staging-next staging/staging-linus linus/master v6.11-rc3 next-20240814]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Li-Li/add-binder-genl-for-txn-report/20240814-150338
base:   staging/staging-testing
patch link:    https://lore.kernel.org/r/20240812211844.4107494-1-dualli%40chromium.org
patch subject: [PATCH v1] add binder genl for txn report
config: x86_64-buildonly-randconfig-002-20240814 (https://download.01.org/0day-ci/archive/20240815/202408150004.BAtK29zS-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240815/202408150004.BAtK29zS-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408150004.BAtK29zS-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

>> drivers/android/binder_genl.c:20: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
    * The registered process that would receive binder reports.
   drivers/android/binder_genl.c:25: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
    * The policy to verify the type of the binder genl data
>> drivers/android/binder_genl.c:102: warning: cannot understand function prototype: 'struct genl_small_ops binder_genl_ops[] = '
>> drivers/android/binder_genl.c:114: warning: cannot understand function prototype: 'struct genl_family binder_gnl_family = '
--
   ld: drivers/android/binder_genl.o: in function `binder_genl_cmd_doit':
>> binder_genl.c:(.text+0x38): undefined reference to `__alloc_skb'
>> ld: binder_genl.c:(.text+0x61): undefined reference to `genlmsg_put'
>> ld: binder_genl.c:(.text+0x83): undefined reference to `nla_put'
>> ld: binder_genl.c:(.text+0xb7): undefined reference to `init_net'
>> ld: binder_genl.c:(.text+0xbc): undefined reference to `netlink_unicast'
>> ld: binder_genl.c:(.text+0x135): undefined reference to `skb_trim'
>> ld: binder_genl.c:(.text+0x144): undefined reference to `sk_skb_reason_drop'
   ld: binder_genl.c:(.text+0x195): undefined reference to `sk_skb_reason_drop'
   ld: drivers/android/binder_genl.o: in function `binder_genl_send_report':
   binder_genl.c:(.text+0x21c): undefined reference to `__alloc_skb'
   ld: binder_genl.c:(.text+0x248): undefined reference to `genlmsg_put'
   ld: binder_genl.c:(.text+0x267): undefined reference to `nla_put'
   ld: binder_genl.c:(.text+0x298): undefined reference to `init_net'
   ld: binder_genl.c:(.text+0x29d): undefined reference to `netlink_unicast'
   ld: binder_genl.c:(.text+0x304): undefined reference to `skb_trim'
   ld: binder_genl.c:(.text+0x313): undefined reference to `sk_skb_reason_drop'
   ld: binder_genl.c:(.text+0x34c): undefined reference to `sk_skb_reason_drop'
   ld: drivers/android/binder_genl.o: in function `binder_genl_init':
>> binder_genl.c:(.init.text+0x12): undefined reference to `genl_register_family'

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [PATCH v1] add binder genl for txn report
Posted by Greg KH 1 year, 5 months ago
On Mon, Aug 12, 2024 at 02:18:44PM -0700, Li Li wrote:
> From: Li Li <dualli@google.com>

Sorry, but I can not parse your Subject: line at all.  Please use
vowels, we don't have a lack of them :)

Also look at how other patches are formatted for these files to get an
idea of how to create a good subject line.

> Frozen tasks can't process binder transactions, so sync binder
> transactions will fail with BR_FROZEN_REPLY and async binder
> transactions will be queued in the kernel async binder buffer.
> As these queued async transactions accumulates over time, the async
> buffer will eventually be running out, denying all new transactions
> after that with BR_FAILED_REPLY.
> 
> In addition to the above cases, different kinds of binder error codes
> might be returned to the sender. However, the core Linux, or Android,
> system administration process never knows what's actually happening.
> 
> This patch introduces the Linux generic netlink messages into the binder
> driver so that the Linux/Android system administration process can
> listen to important events and take corresponding actions, like stopping
> a broken app from attacking the OS by sending huge amount of spamming
> binder transactions.
> 
> To prevent making the already bloated binder.c even bigger, a new source
> file binder_genl.c is created to host those generic netlink code.

"genl" is a rough abbreviation that is not going to be easy to remember,
what's wrong with "binder_netlink.c"?


> 
> Usage example (user space pseudo code):
> 
> 	// enable binder report from the interested binder context(s)
> 	struct binder_report_info info = {0, BINDER_REPORT_ALL};
> 	ioctl(binder1, BINDER_ENABLE_REPORT, &info);
> 	ioctl(binder2, BINDER_ENABLE_REPORT, &info);
> 
> 	// set optional per-process report, overriding the global one
> 	struct binder_report_info info2;
> 	info2.pid = getpid();
> 	info2.flags = BINDER_REPORT_FAILED | BINDER_REPORT_OVERRIDE;
> 	ioctl(binder2, BINDER_ENABLE_REPORT, &info2);
> 
> 	// open netlink socket
> 	int fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
> 
> 	// bind netlink socket
> 	bind(fd, struct socketaddr);
> 
> 	// get the family id of binder generic netlink
> 	send(fd, CTRL_CMD_GETFAMILY, CTRL_ATTR_FAMILY_NAME);
> 	int id = recv(CTRL_ATTR_FAMILY_ID);
> 
> 	// register the current process to receive binder reports
> 	send(fd, id, BINDER_GENL_CMD_REGISTER);
> 
> 	// verify registration by reading back the registered pid
> 	recv(fd, BINDER_GENL_ATTR_PID, &pid);
> 
> 	// wait and read all binder reports
> 	while (running) {
> 		struct binder_report report;
> 		recv(fd, BINDER_GENL_ATTR_REPORT, &report);
> 
> 		// process struct binder_report
> 		do_something(&report);
> 	}
> 
> 	// clean up
> 	close(fd);

What userspace code is now going to use this and require it?  How was it
tested?  Where is the test code?  Where is the new user/kernel api that
you created here documented?

You added a new ioctl here as well, why not mention that?  Why is it
needed?  Why not always emit netlink messages?  How do you turn them
off?

And how does this deal with binder namespaces?  Are these messages all
now "global" somehow?  Or are they separated properly per namespace?

thanks,

greg k-h
Re: [PATCH v1] add binder genl for txn report
Posted by Li Li 1 year, 5 months ago
On Mon, Aug 12, 2024 at 10:13 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Mon, Aug 12, 2024 at 02:18:44PM -0700, Li Li wrote:
> > From: Li Li <dualli@google.com>
>
> Sorry, but I can not parse your Subject: line at all.  Please use
> vowels, we don't have a lack of them :)
>
> Also look at how other patches are formatted for these files to get an
> idea of how to create a good subject line.

Thank you for reviewing the patch!

Sure, I'll find a more meaningful subject in v2.

> > To prevent making the already bloated binder.c even bigger, a new source
> > file binder_genl.c is created to host those generic netlink code.
>
> "genl" is a rough abbreviation that is not going to be easy to remember,
> what's wrong with "binder_netlink.c"?

It's just because genl has already been used in both of generic netlink
kernel code (e.g. in linux/include/net/genetlink.h) and user space libraries
https://man7.org/linux/man-pages/man8/genl.8.html.

I'm happy to rename it to binder_netlink.c if that's easier to remember.

>
>
> >
> > Usage example (user space pseudo code):
> >
> >       // enable binder report from the interested binder context(s)
> >       struct binder_report_info info = {0, BINDER_REPORT_ALL};
> >       ioctl(binder1, BINDER_ENABLE_REPORT, &info);
> >       ioctl(binder2, BINDER_ENABLE_REPORT, &info);
> >
> >       // set optional per-process report, overriding the global one
> >       struct binder_report_info info2;
> >       info2.pid = getpid();
> >       info2.flags = BINDER_REPORT_FAILED | BINDER_REPORT_OVERRIDE;
> >       ioctl(binder2, BINDER_ENABLE_REPORT, &info2);
> >
> >       // open netlink socket
> >       int fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
> >
> >       // bind netlink socket
> >       bind(fd, struct socketaddr);
> >
> >       // get the family id of binder generic netlink
> >       send(fd, CTRL_CMD_GETFAMILY, CTRL_ATTR_FAMILY_NAME);
> >       int id = recv(CTRL_ATTR_FAMILY_ID);
> >
> >       // register the current process to receive binder reports
> >       send(fd, id, BINDER_GENL_CMD_REGISTER);
> >
> >       // verify registration by reading back the registered pid
> >       recv(fd, BINDER_GENL_ATTR_PID, &pid);
> >
> >       // wait and read all binder reports
> >       while (running) {
> >               struct binder_report report;
> >               recv(fd, BINDER_GENL_ATTR_REPORT, &report);
> >
> >               // process struct binder_report
> >               do_something(&report);
> >       }
> >
> >       // clean up
> >       close(fd);
>
> What userspace code is now going to use this and require it?  How was it
> tested?  Where is the test code?  Where is the new user/kernel api that
> you created here documented?

As mentioned in the commit message, binder is used in Android OS. But the
user space administration process can do little to deal with binder transaction
errors. This is tested with Android. I'll upload the user space code to AOSP.
If there's a better option to host the test code, e.g. a smaller and
simpler project
that uses binder, please let me know.

>
> You added a new ioctl here as well, why not mention that?  Why is it
> needed?  Why not always emit netlink messages?  How do you turn them
> off?

The generic netlink message is a convenient way for the kernel driver to send
information to user space. Technically it's possible to replace this
new ioctl with
a netlink message. But that requires much more unnecessary code parsing the
raw byte streams and accessing internal binder_proc and other structures from
netlink layer. It's much more elegant to use an ioctl with only a
couple lines of
source code.

To turn them off, just call the same ioctl, resetting the flags to 0.
That said, the
name of this new ioctl (BINDER_ENABLE_REPORT) isn't good enough.
What do you think if I change it to BINDER_SET_NETLINK_REPORT?

>
> And how does this deal with binder namespaces?  Are these messages all
> now "global" somehow?  Or are they separated properly per namespace?

The new ioctl BINDER_ENABLE_REPORT (again, it deserves a better name)
sets the report_flags for its associated binder context. Each binder context has
its own settings. The messages from all binder contexts are sent to user space
via the same netlink socket. The user space code can enable the reports for
each binder context separately and distinguish the netlink messages by the
name of the binder context.

It's also possible to have one netlink socket for each binder context.
But it seems
like overkill to me.

Thanks,
Li
Re: [PATCH v1] add binder genl for txn report
Posted by Greg KH 1 year, 5 months ago
On Mon, Aug 12, 2024 at 11:16:27PM -0700, Li Li wrote:
> On Mon, Aug 12, 2024 at 10:13 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > On Mon, Aug 12, 2024 at 02:18:44PM -0700, Li Li wrote:
> > > From: Li Li <dualli@google.com>
> >
> > Sorry, but I can not parse your Subject: line at all.  Please use
> > vowels, we don't have a lack of them :)
> >
> > Also look at how other patches are formatted for these files to get an
> > idea of how to create a good subject line.
> 
> Thank you for reviewing the patch!
> 
> Sure, I'll find a more meaningful subject in v2.
> 
> > > To prevent making the already bloated binder.c even bigger, a new source
> > > file binder_genl.c is created to host those generic netlink code.
> >
> > "genl" is a rough abbreviation that is not going to be easy to remember,
> > what's wrong with "binder_netlink.c"?
> 
> It's just because genl has already been used in both of generic netlink
> kernel code (e.g. in linux/include/net/genetlink.h) and user space libraries
> https://man7.org/linux/man-pages/man8/genl.8.html.

Ah, I wasn't aware of the existing names, so that's fine if it is what
the networking world is used to.

Which reminds me, why aren't you asking for their review here as well to
ensure that you are doing things with netlink correctly?

> > What userspace code is now going to use this and require it?  How was it
> > tested?  Where is the test code?  Where is the new user/kernel api that
> > you created here documented?
> 
> As mentioned in the commit message, binder is used in Android OS. But the
> user space administration process can do little to deal with binder transaction
> errors. This is tested with Android. I'll upload the user space code to AOSP.
> If there's a better option to host the test code, e.g. a smaller and
> simpler project
> that uses binder, please let me know.

It needs to be somewhere, otherwise we don't know how any of this is
being used at all.  And there was some binder "test code" somewhere, is
this new functionality added to that also?

> > You added a new ioctl here as well, why not mention that?  Why is it
> > needed?  Why not always emit netlink messages?  How do you turn them
> > off?
> 
> The generic netlink message is a convenient way for the kernel driver to send
> information to user space. Technically it's possible to replace this
> new ioctl with
> a netlink message. But that requires much more unnecessary code parsing the
> raw byte streams and accessing internal binder_proc and other structures from
> netlink layer. It's much more elegant to use an ioctl with only a
> couple lines of
> source code.

Then you need to document that somewhere.

> To turn them off, just call the same ioctl, resetting the flags to 0.
> That said, the
> name of this new ioctl (BINDER_ENABLE_REPORT) isn't good enough.
> What do you think if I change it to BINDER_SET_NETLINK_REPORT?

Yes, the name needs to change if you can both enable and disable reports
from it.

> > And how does this deal with binder namespaces?  Are these messages all
> > now "global" somehow?  Or are they separated properly per namespace?
> 
> The new ioctl BINDER_ENABLE_REPORT (again, it deserves a better name)
> sets the report_flags for its associated binder context. Each binder context has
> its own settings. The messages from all binder contexts are sent to user space
> via the same netlink socket. The user space code can enable the reports for
> each binder context separately and distinguish the netlink messages by the
> name of the binder context.

So userspace will now get all reports and has to do the parsing to
determine what is, and is not, relevant for what they want?  That feels
like a big security hole to me, has this been audited by the Android
security team yet?

> It's also possible to have one netlink socket for each binder context.
> But it seems
> like overkill to me.

Again, think of security issues please.  Do you want all binder
processes in a system to be able to see all other messages?

thanks,

greg k-h