[PATCH] libbpf: fix BTF dedup to support recursive typedef definitions

paulhoussel2@gmail.com posted 1 patch 1 month, 1 week ago
tools/lib/bpf/btf.c | 97 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 95 insertions(+), 2 deletions(-)
[PATCH] libbpf: fix BTF dedup to support recursive typedef definitions
Posted by paulhoussel2@gmail.com 1 month, 1 week ago
From: Paul Houssel <paul.houssel@orange.com>

Handle recursive typedefs in BTF deduplication

Pahole fails to encode BTF for some Go projects (e.g. Kubernetes and
Podman) due to recursive type definitions that create reference loops
not representable in C. These recursive typedefs trigger a failure in
the BTF deduplication algorithm.

This patch extends btf_dedup_ref_type() to properly handle potential
recursion for BTF_KIND_TYPEDEF, similar to how recursion is already
handled for BTF_KIND_STRUCT. This allows pahole to successfully
generate BTF for Go binaries using recursive types without impacting
existing C-based workflows.

Co-developed-by: Martin Horth <martin.horth@telecom-sudparis.eu>
Signed-off-by: Martin Horth <martin.horth@telecom-sudparis.eu>
Co-developed-by: Ouail Derghal <ouail.derghal@imt-atlantique.fr>
Signed-off-by: Ouail Derghal <ouail.derghal@imt-atlantique.fr>
Co-developed-by: Guilhem Jazeron <guilhem.jazeron@inria.fr>
Signed-off-by: Guilhem Jazeron <guilhem.jazeron@inria.fr>
Co-developed-by: Ludovic Paillat <ludovic.paillat@inria.fr>
Signed-off-by: Ludovic Paillat <ludovic.paillat@inria.fr>
Co-developed-by: Robin Theveniaut <robin.theveniaut@irit.fr>
Signed-off-by: Robin Theveniaut <robin.theveniaut@irit.fr>
Suggested-by: Tristan d'Audibert <tristan.daudibert@gmail.com>
Signed-off-by: Paul Houssel <paul.houssel@orange.com>

---
The issue was originally observed when attempting to encode BTF for
Kubernetes binaries (kubectl, kubeadm):

$ git clone --depth 1 https://github.com/kubernetes/kubernetes
$ cd ./kubernetes
$ make kubeadm DBG=1
$ pahole --btf_encode_detached=kubeadm.btf _output/bin/kubeadm
btf_encoder__encode: btf__dedup failed!
Failed to encode BTF

The root cause lies in recursive type definitions that cannot exist
in C but are valid in Go.

program.go:

"package main

type Foo func() Foo

func main() {
	bar()
}

func bar() Foo {
	return nil
}"

Building and encoding this program with pahole triggers the same
deduplication failure:

$ go build -gcflags "all=-N -l" ./program.go
$ pahole --btf_encode_detached=program.btf program
btf_encoder__encode: btf__dedup failed!
Failed to encode BTF

As noted in the comment of btf_dedup_ref_type(), the deduplication
logic previously assumed recursion only occurs through structs or
unions:

"[...] there is no danger of encountering cycles because in C type
system the only way to form type cycle is through struct/union, so
any chain of reference types, even those taking part in a type
cycle, will inevitably reach struct/union at some point."

However, Go allows such recursion through typedef-like constructs
(function types, aliases), requiring a special case for
BTF_KIND_TYPEDEF.

This patch introduces that special handling, ensuring pahole can
handle Go-generated BTFs while maintaining compatibility with
existing C workflows.
---
 tools/lib/bpf/btf.c | 97 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 95 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 9f141395c074..239f52115c53 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -3408,6 +3408,7 @@ static int btf_dedup_prep(struct btf_dedup *d);
 static int btf_dedup_strings(struct btf_dedup *d);
 static int btf_dedup_prim_types(struct btf_dedup *d);
 static int btf_dedup_struct_types(struct btf_dedup *d);
+static int btf_dedup_typedef_types(struct btf_dedup *d);
 static int btf_dedup_ref_types(struct btf_dedup *d);
 static int btf_dedup_resolve_fwds(struct btf_dedup *d);
 static int btf_dedup_compact_types(struct btf_dedup *d);
@@ -3590,6 +3591,11 @@ int btf__dedup(struct btf *btf, const struct btf_dedup_opts *opts)
 		pr_debug("btf_dedup_struct_types failed: %s\n", errstr(err));
 		goto done;
 	}
+	err = btf_dedup_typedef_types(d);
+	if (err < 0) {
+		pr_debug("btf_dedup_typedef_types failed: %s\n", errstr(err));
+		goto done;
+	}
 	err = btf_dedup_resolve_fwds(d);
 	if (err < 0) {
 		pr_debug("btf_dedup_resolve_fwds failed: %s\n", errstr(err));
@@ -3901,6 +3907,20 @@ static int btf_dedup_strings(struct btf_dedup *d)
 	return err;
 }
 
+/*
+ * Calculate type signature hash of TYPEDEF, ignoring referenced type IDs,
+ * as referenced type IDs equivalence is established separately during type
+ * graph equivalence check algorithm.
+ */
+static long btf_hash_typedef(struct btf_type *t)
+{
+	long h;
+
+	h = hash_combine(0, t->name_off);
+	h = hash_combine(h, t->info);
+	return h;
+}
+
 static long btf_hash_common(struct btf_type *t)
 {
 	long h;
@@ -3918,6 +3938,13 @@ static bool btf_equal_common(struct btf_type *t1, struct btf_type *t2)
 	       t1->size == t2->size;
 }
 
+/* Check structural compatibility of two TYPEDEF. */
+static bool btf_equal_typedef(struct btf_type *t1, struct btf_type *t2)
+{
+	return t1->name_off == t2->name_off &&
+	       t1->info == t2->info;
+}
+
 /* Calculate type signature hash of INT or TAG. */
 static long btf_hash_int_decl_tag(struct btf_type *t)
 {
@@ -4936,10 +4963,77 @@ static int btf_dedup_struct_types(struct btf_dedup *d)
 	return 0;
 }
 
+/*
+ * Deduplicate typedef types.
+ *
+ * Similar as for struct/union types, for each typedef type its type
+ * signature hash is calculated, taking into account type's name
+ * and its size, but ignoring type ID's referenced from fields,
+ * because they might not be deduped completely until after
+ * reference types deduplication phase.
+ */
+static int btf_dedup_typedef_type(struct btf_dedup *d, __u32 type_id)
+{
+	struct btf_type *cand_type, *t;
+	struct hashmap_entry *hash_entry;
+	/* if we don't find equivalent type, then we are canonical */
+	__u32 new_id = type_id;
+	__u16 kind;
+	long h;
+
+	if (d->map[type_id] <= BTF_MAX_NR_TYPES)
+		return 0;
+
+	t = btf_type_by_id(d->btf, type_id);
+	kind = btf_kind(t);
+
+	if (kind != BTF_KIND_TYPEDEF)
+		return 0;
+	h = btf_hash_typedef(t);
+	for_each_dedup_cand(d, hash_entry, h) {
+		__u32 cand_id = hash_entry->value;
+		int eq;
+
+		cand_type = btf_type_by_id(d->btf, cand_id);
+		if (!btf_equal_typedef(t, cand_type))
+			continue;
+
+		btf_dedup_clear_hypot_map(d);
+		eq = btf_dedup_is_equiv(d, type_id, cand_id);
+		if (eq < 0)
+			return eq;
+		if (!eq)
+			continue;
+		btf_dedup_merge_hypot_map(d);
+		if (d->hypot_adjust_canon) /* not really equivalent */
+			continue;
+		new_id = cand_id;
+		break;
+	}
+
+	d->map[type_id] = new_id;
+	if (type_id == new_id && btf_dedup_table_add(d, h, type_id))
+		return -ENOMEM;
+
+	return 0;
+}
+
+static int btf_dedup_typedef_types(struct btf_dedup *d)
+{
+	int i, err;
+
+	for (i = 0; i < d->btf->nr_types; i++) {
+		err = btf_dedup_typedef_type(d, d->btf->start_id + i);
+		if (err)
+			return err;
+	}
+	return 0;
+}
+
 /*
  * Deduplicate reference type.
  *
- * Once all primitive and struct/union types got deduplicated, we can easily
+ * Once all primitive, struct/union and typedef types got deduplicated, we can easily
  * deduplicate all other (reference) BTF types. This is done in two steps:
  *
  * 1. Resolve all referenced type IDs into their canonical type IDs. This
@@ -4982,7 +5076,6 @@ static int btf_dedup_ref_type(struct btf_dedup *d, __u32 type_id)
 	case BTF_KIND_VOLATILE:
 	case BTF_KIND_RESTRICT:
 	case BTF_KIND_PTR:
-	case BTF_KIND_TYPEDEF:
 	case BTF_KIND_FUNC:
 	case BTF_KIND_TYPE_TAG:
 		ref_type_id = btf_dedup_ref_type(d, t->type);
-- 
2.43.0
Re: [PATCH] libbpf: fix BTF dedup to support recursive typedef definitions
Posted by Eduard Zingerman 1 month, 1 week ago
On Fri, 2025-11-07 at 16:34 +0100, paulhoussel2@gmail.com wrote:

[...]

> @@ -4936,10 +4963,77 @@ static int btf_dedup_struct_types(struct btf_dedup *d)
>  	return 0;
>  }
>
> +/*
> + * Deduplicate typedef types.
> + *
> + * Similar as for struct/union types, for each typedef type its type
> + * signature hash is calculated, taking into account type's name
> + * and its size, but ignoring type ID's referenced from fields,
> + * because they might not be deduped completely until after
> + * reference types deduplication phase.
> + */
> +static int btf_dedup_typedef_type(struct btf_dedup *d, __u32 type_id)
> +{

This function is effectively identical to btf_dedup_struct_type(),
the only things that differ are calls to hash and equal funcs:

	 	t = btf_type_by_id(d->btf, type_id);
	 	kind = btf_kind(t);

	-	if (kind != BTF_KIND_STRUCT && kind != BTF_KIND_UNION)
	+	if (kind != BTF_KIND_TYPEDEF)
	 		return 0;
	-
	-	h = btf_hash_struct(t);
	+	h = btf_hash_typedef(t);
	 	for_each_dedup_cand(d, hash_entry, h) {
	 		__u32 cand_id = hash_entry->value;
	 		int eq;

	 		cand_type = btf_type_by_id(d->btf, cand_id);
	-		if (!btf_shallow_equal_struct(t, cand_type))
	+		if (!btf_equal_typedef(t, cand_type))
	 			continue;

	 		btf_dedup_clear_hypot_map(d);

I don't like coping the logic related to hypot map maintenance and
d->hypot_adjust_canon flat processing.

Instead of copy-pasting, could you please modify btf_dedup_struct_type()?
- extend the type check to allow BTF_KIND_TYPEDEF;
- replace calls to btf_hash_struct() and btf_shallow_equal_struct()
  with calls to functions that select btf_hash_struct() /
  btf_hash_typedef() etc basing on the type?

Also, could you please add tests?
See tools/testing/selftests/bpf/prog_tests/btf.c.

> +	struct btf_type *cand_type, *t;
> +	struct hashmap_entry *hash_entry;
> +	/* if we don't find equivalent type, then we are canonical */
> +	__u32 new_id = type_id;
> +	__u16 kind;
> +	long h;
> +
> +	if (d->map[type_id] <= BTF_MAX_NR_TYPES)
> +		return 0;
> +
> +	t = btf_type_by_id(d->btf, type_id);
> +	kind = btf_kind(t);
> +
> +	if (kind != BTF_KIND_TYPEDEF)
> +		return 0;
> +	h = btf_hash_typedef(t);
> +	for_each_dedup_cand(d, hash_entry, h) {
> +		__u32 cand_id = hash_entry->value;
> +		int eq;
> +
> +		cand_type = btf_type_by_id(d->btf, cand_id);
> +		if (!btf_equal_typedef(t, cand_type))
> +			continue;
> +
> +		btf_dedup_clear_hypot_map(d);
> +		eq = btf_dedup_is_equiv(d, type_id, cand_id);
> +		if (eq < 0)
> +			return eq;
> +		if (!eq)
> +			continue;
> +		btf_dedup_merge_hypot_map(d);
> +		if (d->hypot_adjust_canon) /* not really equivalent */
> +			continue;
> +		new_id = cand_id;
> +		break;
> +	}
> +
> +	d->map[type_id] = new_id;
> +	if (type_id == new_id && btf_dedup_table_add(d, h, type_id))
> +		return -ENOMEM;
> +
> +	return 0;
> +}

[...]
Re: [PATCH] libbpf: fix BTF dedup to support recursive typedef definitions
Posted by Eduard Zingerman 1 month, 1 week ago
On Fri, 2025-11-07 at 16:34 +0100, paulhoussel2@gmail.com wrote:
> From: Paul Houssel <paul.houssel@orange.com>
> 
> Handle recursive typedefs in BTF deduplication
> 
> Pahole fails to encode BTF for some Go projects (e.g. Kubernetes and
> Podman) due to recursive type definitions that create reference loops
> not representable in C. These recursive typedefs trigger a failure in
> the BTF deduplication algorithm.
> 
> This patch extends btf_dedup_ref_type() to properly handle potential
> recursion for BTF_KIND_TYPEDEF, similar to how recursion is already
> handled for BTF_KIND_STRUCT. This allows pahole to successfully
> generate BTF for Go binaries using recursive types without impacting
> existing C-based workflows.
> 
> Co-developed-by: Martin Horth <martin.horth@telecom-sudparis.eu>
> Signed-off-by: Martin Horth <martin.horth@telecom-sudparis.eu>
> Co-developed-by: Ouail Derghal <ouail.derghal@imt-atlantique.fr>
> Signed-off-by: Ouail Derghal <ouail.derghal@imt-atlantique.fr>
> Co-developed-by: Guilhem Jazeron <guilhem.jazeron@inria.fr>
> Signed-off-by: Guilhem Jazeron <guilhem.jazeron@inria.fr>
> Co-developed-by: Ludovic Paillat <ludovic.paillat@inria.fr>
> Signed-off-by: Ludovic Paillat <ludovic.paillat@inria.fr>
> Co-developed-by: Robin Theveniaut <robin.theveniaut@irit.fr>
> Signed-off-by: Robin Theveniaut <robin.theveniaut@irit.fr>
> Suggested-by: Tristan d'Audibert <tristan.daudibert@gmail.com>
> Signed-off-by: Paul Houssel <paul.houssel@orange.com>
> 
> ---
> The issue was originally observed when attempting to encode BTF for
> Kubernetes binaries (kubectl, kubeadm):
> 
> $ git clone --depth 1 https://github.com/kubernetes/kubernetes
> $ cd ./kubernetes
> $ make kubeadm DBG=1
> $ pahole --btf_encode_detached=kubeadm.btf _output/bin/kubeadm
> btf_encoder__encode: btf__dedup failed!
> Failed to encode BTF

Hi Paul,

Could you please provide some details on why would you like to use BTF
for golang programs? Also, is this the only scenario when golang
generated DWARF has loops not possible in C code?

[...]
Re: [PATCH] libbpf: fix BTF dedup to support recursive typedef definitions
Posted by polo 1 month, 1 week ago
Hello Eduard,

On Fri, 7 Nov 2025 at 20:45, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Fri, 2025-11-07 at 16:34 +0100, paulhoussel2@gmail.com wrote:
> > From: Paul Houssel <paul.houssel@orange.com>
> >
> > Handle recursive typedefs in BTF deduplication
> >
> > Pahole fails to encode BTF for some Go projects (e.g. Kubernetes and
> > Podman) due to recursive type definitions that create reference loops
> > not representable in C. These recursive typedefs trigger a failure in
> > the BTF deduplication algorithm.
> >
> > This patch extends btf_dedup_ref_type() to properly handle potential
> > recursion for BTF_KIND_TYPEDEF, similar to how recursion is already
> > handled for BTF_KIND_STRUCT. This allows pahole to successfully
> > generate BTF for Go binaries using recursive types without impacting
> > existing C-based workflows.
> >
> > Co-developed-by: Martin Horth <martin.horth@telecom-sudparis.eu>
> > Signed-off-by: Martin Horth <martin.horth@telecom-sudparis.eu>
> > Co-developed-by: Ouail Derghal <ouail.derghal@imt-atlantique.fr>
> > Signed-off-by: Ouail Derghal <ouail.derghal@imt-atlantique.fr>
> > Co-developed-by: Guilhem Jazeron <guilhem.jazeron@inria.fr>
> > Signed-off-by: Guilhem Jazeron <guilhem.jazeron@inria.fr>
> > Co-developed-by: Ludovic Paillat <ludovic.paillat@inria.fr>
> > Signed-off-by: Ludovic Paillat <ludovic.paillat@inria.fr>
> > Co-developed-by: Robin Theveniaut <robin.theveniaut@irit.fr>
> > Signed-off-by: Robin Theveniaut <robin.theveniaut@irit.fr>
> > Suggested-by: Tristan d'Audibert <tristan.daudibert@gmail.com>
> > Signed-off-by: Paul Houssel <paul.houssel@orange.com>
> >
> > ---
> > The issue was originally observed when attempting to encode BTF for
> > Kubernetes binaries (kubectl, kubeadm):
> >
> > $ git clone --depth 1 https://github.com/kubernetes/kubernetes
> > $ cd ./kubernetes
> > $ make kubeadm DBG=1
> > $ pahole --btf_encode_detached=kubeadm.btf _output/bin/kubeadm
> > btf_encoder__encode: btf__dedup failed!
> > Failed to encode BTF
>
> Hi Paul,
>
> Could you please provide some details on why would you like to use BTF
> for golang programs?

We would like to use BTF for Golang programs in order to trace
compiled Go user-space applications using eBPF uprobe programs.
Tetragon [1] implements the use of the BTF file to resolve paths to
attributes in hook parameters, and therefore if we can obtain the BTF
for Go programs, we will be able to start reading any attributes.
Recently, this feature has been extended to support uprobes [2].

[1] https://tetragon.io/docs/concepts/tracing-policy/hooks/#attribute-resolution
[2] https://github.com/cilium/tetragon/pull/4286#pullrequestreview-3427725698

> Also, is this the only scenario when golang
> generated DWARF has loops not possible in C code?

This is the only scenario we’ve identified where Golang DWARF contains
loops, which are not possible in C. We’re not aware of any other
Go-specific characteristics that could cause additional DWARF loops.
We tested BTF generation on a set of Go projects that are quite large
and representative of the diversity of Go programs, and we only
observed loops for this specific typedef usage.

Paul Houssel
Re: [PATCH] libbpf: fix BTF dedup to support recursive typedef definitions
Posted by Eduard Zingerman 1 month, 1 week ago
On Fri, 2025-11-07 at 21:49 +0100, polo wrote:
> Hello Eduard,
> 
> On Fri, 7 Nov 2025 at 20:45, Eduard Zingerman <eddyz87@gmail.com> wrote:
> > 
> > On Fri, 2025-11-07 at 16:34 +0100, paulhoussel2@gmail.com wrote:
> > > From: Paul Houssel <paul.houssel@orange.com>
> > > 
> > > Handle recursive typedefs in BTF deduplication
> > > 
> > > Pahole fails to encode BTF for some Go projects (e.g. Kubernetes and
> > > Podman) due to recursive type definitions that create reference loops
> > > not representable in C. These recursive typedefs trigger a failure in
> > > the BTF deduplication algorithm.
> > > 
> > > This patch extends btf_dedup_ref_type() to properly handle potential
> > > recursion for BTF_KIND_TYPEDEF, similar to how recursion is already
> > > handled for BTF_KIND_STRUCT. This allows pahole to successfully
> > > generate BTF for Go binaries using recursive types without impacting
> > > existing C-based workflows.
> > > 
> > > Co-developed-by: Martin Horth <martin.horth@telecom-sudparis.eu>
> > > Signed-off-by: Martin Horth <martin.horth@telecom-sudparis.eu>
> > > Co-developed-by: Ouail Derghal <ouail.derghal@imt-atlantique.fr>
> > > Signed-off-by: Ouail Derghal <ouail.derghal@imt-atlantique.fr>
> > > Co-developed-by: Guilhem Jazeron <guilhem.jazeron@inria.fr>
> > > Signed-off-by: Guilhem Jazeron <guilhem.jazeron@inria.fr>
> > > Co-developed-by: Ludovic Paillat <ludovic.paillat@inria.fr>
> > > Signed-off-by: Ludovic Paillat <ludovic.paillat@inria.fr>
> > > Co-developed-by: Robin Theveniaut <robin.theveniaut@irit.fr>
> > > Signed-off-by: Robin Theveniaut <robin.theveniaut@irit.fr>
> > > Suggested-by: Tristan d'Audibert <tristan.daudibert@gmail.com>
> > > Signed-off-by: Paul Houssel <paul.houssel@orange.com>
> > > 
> > > ---
> > > The issue was originally observed when attempting to encode BTF for
> > > Kubernetes binaries (kubectl, kubeadm):
> > > 
> > > $ git clone --depth 1 https://github.com/kubernetes/kubernetes
> > > $ cd ./kubernetes
> > > $ make kubeadm DBG=1
> > > $ pahole --btf_encode_detached=kubeadm.btf _output/bin/kubeadm
> > > btf_encoder__encode: btf__dedup failed!
> > > Failed to encode BTF
> > 
> > Hi Paul,
> > 
> > Could you please provide some details on why would you like to use BTF
> > for golang programs?
> 
> We would like to use BTF for Golang programs in order to trace
> compiled Go user-space applications using eBPF uprobe programs.
> Tetragon [1] implements the use of the BTF file to resolve paths to
> attributes in hook parameters, and therefore if we can obtain the BTF
> for Go programs, we will be able to start reading any attributes.
> Recently, this feature has been extended to support uprobes [2].
> 
> [1] https://tetragon.io/docs/concepts/tracing-policy/hooks/#attribute-resolution
> [2] https://github.com/cilium/tetragon/pull/4286#pullrequestreview-3427725698
> 
> > Also, is this the only scenario when golang
> > generated DWARF has loops not possible in C code?
> 
> This is the only scenario we’ve identified where Golang DWARF contains
> loops, which are not possible in C. We’re not aware of any other
> Go-specific characteristics that could cause additional DWARF loops.
> We tested BTF generation on a set of Go projects that are quite large
> and representative of the diversity of Go programs, and we only
> observed loops for this specific typedef usage.
> 
> Paul Houssel

Hi Paul,

Thank you for explaining, this sounds like a reasonable use-case.
I'll comment on the patch itself in the separate email.

Thanks,
Eduard