[PATCH net-next v3] net: Implement fault injection forcing skb reallocation

Breno Leitao posted 1 patch 1 month, 1 week ago
There is a newer version of this series
.../fault-injection/fault-injection.rst       |  35 ++++++
include/linux/skbuff.h                        |   9 ++
lib/Kconfig.debug                             |  10 ++
net/core/Makefile                             |   1 +
net/core/skb_fault_injection.c                | 103 ++++++++++++++++++
5 files changed, 158 insertions(+)
create mode 100644 net/core/skb_fault_injection.c
[PATCH net-next v3] net: Implement fault injection forcing skb reallocation
Posted by Breno Leitao 1 month, 1 week ago
Introduce a fault injection mechanism to force skb reallocation. The
primary goal is to catch bugs related to pointer invalidation after
potential skb reallocation.

The fault injection mechanism aims to identify scenarios where callers
retain pointers to various headers in the skb but fail to reload these
pointers after calling a function that may reallocate the data. This
type of bug can lead to memory corruption or crashes if the old,
now-invalid pointers are used.

By forcing reallocation through fault injection, we can stress-test code
paths and ensure proper pointer management after potential skb
reallocations.

Add a hook for fault injection in the following functions:

 * pskb_trim_rcsum()
 * pskb_may_pull_reason()
 * pskb_trim()

As the other fault injection mechanism, protect it under a debug Kconfig
called CONFIG_FAIL_SKB_FORCE_REALLOC.

This patch was *heavily* inspired by Jakub's proposal from:
https://lore.kernel.org/all/20240719174140.47a868e6@kernel.org/

CC: Akinobu Mita <akinobu.mita@gmail.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
Changelog:
v3:
 * Remove decision part of skb_might_realloc() into a new function
   should_fail_net_realloc_skb(). Marked it as ALLOW_ERROR_INJECTION,
   so it could be controlled by fail_function and BPF (Paolo)

v2:
 * Moved the CONFIG_FAIL_SKB_FORCE_REALLOC Kconfig entry closer to other
   fault injection Kconfigs.  (Kuniyuki Iwashima)
 * Create a filter mechanism (Akinobu Mita)
 * https://lore.kernel.org/all/20241008111358.1691157-1-leitao@debian.org/

v1:
 * https://lore.kernel.org/all/20241002113316.2527669-1-leitao@debian.org/

 .../fault-injection/fault-injection.rst       |  35 ++++++
 include/linux/skbuff.h                        |   9 ++
 lib/Kconfig.debug                             |  10 ++
 net/core/Makefile                             |   1 +
 net/core/skb_fault_injection.c                | 103 ++++++++++++++++++
 5 files changed, 158 insertions(+)
 create mode 100644 net/core/skb_fault_injection.c

diff --git a/Documentation/fault-injection/fault-injection.rst b/Documentation/fault-injection/fault-injection.rst
index 8b8aeea71c68..bb19638d5317 100644
--- a/Documentation/fault-injection/fault-injection.rst
+++ b/Documentation/fault-injection/fault-injection.rst
@@ -45,6 +45,28 @@ Available fault injection capabilities
   ALLOW_ERROR_INJECTION() macro, by setting debugfs entries
   under /sys/kernel/debug/fail_function. No boot option supported.
 
+- fail_net_force_skb_realloc
+
+  inject skb (socket buffer) reallocation events into the network path. The
+  primary goal is to identify and prevent issues related to pointer
+  mismanagement in the network subsystem.  By forcing skb reallocation at
+  strategic points, this feature creates scenarios where existing pointers to
+  skb headers become invalid.
+
+  When the fault is injected and the reallocation is triggered, these pointers
+  no longer reference valid memory locations. This deliberate invalidation
+  helps expose code paths where proper pointer updating is neglected after a
+  reallocation event.
+
+  By creating these controlled fault scenarios, the system can catch instances
+  where stale pointers are used, potentially leading to memory corruption or
+  system instability.
+
+  To select the interface to act on, write the network name to the following file:
+  `/sys/kernel/debug/fail_net_force_skb_realloc/devname`
+  If this field is left empty (which is the default value), skb reallocation
+  will be forced on all network interfaces.
+
 - NVMe fault injection
 
   inject NVMe status code and retry flag on devices permitted by setting
@@ -216,6 +238,18 @@ configuration of fault-injection capabilities.
 	use a negative errno, you better use 'printf' instead of 'echo', e.g.:
 	$ printf %#x -12 > retval
 
+- /sys/kernel/debug/fail_net_force_skb_realloc/devname:
+
+        Specifies the network interface on which to force SKB reallocation.  If
+        left empty, SKB reallocation will be applied to all network interfaces.
+
+        Example usage:
+        # Force skb reallocation on eth0
+        echo "eth0" > /sys/kernel/debug/fail_net_force_skb_realloc/devname
+
+        # Clear the selection and force skb reallocation on all interfaces
+        echo "" > /sys/kernel/debug/fail_net_force_skb_realloc/devname
+
 Boot option
 ^^^^^^^^^^^
 
@@ -227,6 +261,7 @@ use the boot option::
 	fail_usercopy=
 	fail_make_request=
 	fail_futex=
+	fail_net_force_skb_realloc=
 	mmc_core.fail_request=<interval>,<probability>,<space>,<times>
 
 proc entries
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 48f1e0fa2a13..6a77dabd86c3 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2681,6 +2681,12 @@ static inline void skb_assert_len(struct sk_buff *skb)
 #endif /* CONFIG_DEBUG_NET */
 }
 
+#if defined(CONFIG_FAIL_SKB_FORCE_REALLOC)
+void skb_might_realloc(struct sk_buff *skb);
+#else
+static inline void skb_might_realloc(struct sk_buff *skb) {}
+#endif
+
 /*
  *	Add data to an sk_buff
  */
@@ -2781,6 +2787,7 @@ static inline enum skb_drop_reason
 pskb_may_pull_reason(struct sk_buff *skb, unsigned int len)
 {
 	DEBUG_NET_WARN_ON_ONCE(len > INT_MAX);
+	skb_might_realloc(skb);
 
 	if (likely(len <= skb_headlen(skb)))
 		return SKB_NOT_DROPPED_YET;
@@ -3216,6 +3223,7 @@ static inline int __pskb_trim(struct sk_buff *skb, unsigned int len)
 
 static inline int pskb_trim(struct sk_buff *skb, unsigned int len)
 {
+	skb_might_realloc(skb);
 	return (len < skb->len) ? __pskb_trim(skb, len) : 0;
 }
 
@@ -3970,6 +3978,7 @@ int pskb_trim_rcsum_slow(struct sk_buff *skb, unsigned int len);
 
 static inline int pskb_trim_rcsum(struct sk_buff *skb, unsigned int len)
 {
+	skb_might_realloc(skb);
 	if (likely(len >= skb->len))
 		return 0;
 	return pskb_trim_rcsum_slow(skb, len);
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 7315f643817a..fa65e14f7c61 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2115,6 +2115,16 @@ config FAIL_SUNRPC
 	  Provide fault-injection capability for SunRPC and
 	  its consumers.
 
+config FAIL_SKB_FORCE_REALLOC
+	bool "Fault-injection capability forcing skb to reallocate"
+	depends on FAULT_INJECTION_DEBUG_FS
+	help
+	  Provide fault-injection capability that forces the skb to be
+	  reallocated, caughting possible invalid pointers to the skb.
+
+	  For more information, check
+	  Documentation/dev-tools/fault-injection/fault-injection.rst
+
 config FAULT_INJECTION_CONFIGFS
 	bool "Configfs interface for fault-injection capabilities"
 	depends on FAULT_INJECTION
diff --git a/net/core/Makefile b/net/core/Makefile
index 5a72a87ee0f1..14bdc63e4b71 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -46,3 +46,4 @@ obj-$(CONFIG_OF)	+= of_net.o
 obj-$(CONFIG_NET_TEST) += net_test.o
 obj-$(CONFIG_NET_DEVMEM) += devmem.o
 obj-$(CONFIG_DEBUG_NET_SMALL_RTNL) += rtnl_net_debug.o
+obj-$(CONFIG_FAIL_SKB_FORCE_REALLOC) += skb_fault_injection.o
diff --git a/net/core/skb_fault_injection.c b/net/core/skb_fault_injection.c
new file mode 100644
index 000000000000..7839519a0a5f
--- /dev/null
+++ b/net/core/skb_fault_injection.c
@@ -0,0 +1,103 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/fault-inject.h>
+#include <linux/netdevice.h>
+#include <linux/debugfs.h>
+#include <linux/skbuff.h>
+
+static struct {
+	struct fault_attr attr;
+	char devname[IFNAMSIZ];
+	bool filtered;
+} skb_realloc = {
+	.attr = FAULT_ATTR_INITIALIZER,
+	.filtered = false,
+};
+
+static bool should_fail_net_realloc_skb(struct sk_buff *skb)
+{
+	struct net_device *net = skb->dev;
+
+	if (skb_realloc.filtered &&
+	    strncmp(net->name, skb_realloc.devname, IFNAMSIZ))
+		/* device name filter set, but names do not match */
+		return false;
+
+	if (!should_fail(&skb_realloc.attr, 1))
+		return false;
+
+	return true;
+}
+ALLOW_ERROR_INJECTION(should_fail_net_realloc_skb, TRUE);
+
+void skb_might_realloc(struct sk_buff *skb)
+{
+	if (!should_fail_net_realloc_skb(skb))
+		return;
+
+	pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
+}
+EXPORT_SYMBOL(skb_might_realloc);
+
+static int __init skb_realloc_setup(char *str)
+{
+	return setup_fault_attr(&skb_realloc.attr, str);
+}
+__setup("skb_realloc=", skb_realloc_setup);
+
+static void reset_settings(void)
+{
+	skb_realloc.filtered = false;
+	memzero_explicit(&skb_realloc.devname, IFNAMSIZ);
+}
+
+static ssize_t devname_write(struct file *file, const char __user *buffer,
+			     size_t count, loff_t *ppos)
+{
+	ssize_t ret;
+
+	reset_settings();
+	ret = simple_write_to_buffer(&skb_realloc.devname, IFNAMSIZ,
+				     ppos, buffer, count);
+	if (ret < 0)
+		return ret;
+	strim(skb_realloc.devname);
+
+	if (strnlen(skb_realloc.devname, IFNAMSIZ))
+		skb_realloc.filtered = true;
+
+	return count;
+}
+
+static ssize_t devname_read(struct file *file,
+			    char __user *buffer,
+			    size_t size, loff_t *ppos)
+{
+	if (!skb_realloc.filtered)
+		return 0;
+
+	return simple_read_from_buffer(buffer, size, ppos, &skb_realloc.devname,
+				       strlen(skb_realloc.devname));
+}
+
+static const struct file_operations devname_ops = {
+	.write = devname_write,
+	.read = devname_read,
+};
+
+static int __init fail_net_force_skb_realloc_debugfs(void)
+{
+	umode_t mode = S_IFREG | 0600;
+	struct dentry *dir;
+
+	dir = fault_create_debugfs_attr("fail_net_force_skb_realloc", NULL,
+					&skb_realloc.attr);
+	if (IS_ERR(dir))
+		return PTR_ERR(dir);
+
+	debugfs_create_file("devname", mode, dir, NULL, &devname_ops);
+
+	return 0;
+}
+
+late_initcall(fail_net_force_skb_realloc_debugfs);
-- 
2.43.5
Re: [PATCH net-next v3] net: Implement fault injection forcing skb reallocation
Posted by Bagas Sanjaya 1 month ago
On Mon, Oct 14, 2024 at 06:50:00AM -0700, Breno Leitao wrote:
> +  To select the interface to act on, write the network name to the following file:
> +  `/sys/kernel/debug/fail_net_force_skb_realloc/devname`
"... write the network name to /sys/kernel/debug/fail_net_force_skb_realloc/devname."
> +  If this field is left empty (which is the default value), skb reallocation
> +  will be forced on all network interfaces.
> +
> <snipped>...
> +- /sys/kernel/debug/fail_net_force_skb_realloc/devname:
> +
> +        Specifies the network interface on which to force SKB reallocation.  If
> +        left empty, SKB reallocation will be applied to all network interfaces.
> +
> +        Example usage:
> +        # Force skb reallocation on eth0
> +        echo "eth0" > /sys/kernel/debug/fail_net_force_skb_realloc/devname
> +
> +        # Clear the selection and force skb reallocation on all interfaces
> +        echo "" > /sys/kernel/debug/fail_net_force_skb_realloc/devname

The examples rendered as normal paragraph instead (and look like long-running
sentences) so I wrap them in literal code blocks:

---- >8 ----
diff --git a/Documentation/fault-injection/fault-injection.rst b/Documentation/fault-injection/fault-injection.rst
index bb19638d53171b..b2bf3afd16d144 100644
--- a/Documentation/fault-injection/fault-injection.rst
+++ b/Documentation/fault-injection/fault-injection.rst
@@ -243,12 +243,13 @@ configuration of fault-injection capabilities.
         Specifies the network interface on which to force SKB reallocation.  If
         left empty, SKB reallocation will be applied to all network interfaces.
 
-        Example usage:
-        # Force skb reallocation on eth0
-        echo "eth0" > /sys/kernel/debug/fail_net_force_skb_realloc/devname
+        Example usage::
 
-        # Clear the selection and force skb reallocation on all interfaces
-        echo "" > /sys/kernel/debug/fail_net_force_skb_realloc/devname
+          # Force skb reallocation on eth0
+          echo "eth0" > /sys/kernel/debug/fail_net_force_skb_realloc/devname
+
+          # Clear the selection and force skb reallocation on all interfaces
+          echo "" > /sys/kernel/debug/fail_net_force_skb_realloc/devname
 
 Boot option
 ^^^^^^^^^^^

Thanks.

-- 
An old man doll... just what I always wanted! - Clara
Re: [PATCH net-next v3] net: Implement fault injection forcing skb reallocation
Posted by Breno Leitao 1 month ago
Hello Bagas,

On Mon, Oct 21, 2024 at 07:35:29PM +0700, Bagas Sanjaya wrote:
> On Mon, Oct 14, 2024 at 06:50:00AM -0700, Breno Leitao wrote:
> > +  To select the interface to act on, write the network name to the following file:
> > +  `/sys/kernel/debug/fail_net_force_skb_realloc/devname`
> "... write the network name to /sys/kernel/debug/fail_net_force_skb_realloc/devname."
> > +  If this field is left empty (which is the default value), skb reallocation
> > +  will be forced on all network interfaces.
> > +
> > <snipped>...
> > +- /sys/kernel/debug/fail_net_force_skb_realloc/devname:
> > +
> > +        Specifies the network interface on which to force SKB reallocation.  If
> > +        left empty, SKB reallocation will be applied to all network interfaces.
> > +
> > +        Example usage:
> > +        # Force skb reallocation on eth0
> > +        echo "eth0" > /sys/kernel/debug/fail_net_force_skb_realloc/devname
> > +
> > +        # Clear the selection and force skb reallocation on all interfaces
> > +        echo "" > /sys/kernel/debug/fail_net_force_skb_realloc/devname
> 
> The examples rendered as normal paragraph instead (and look like long-running
> sentences) so I wrap them in literal code blocks:

Thanks. I will update it, and send a new version.
Re: [PATCH net-next v3] net: Implement fault injection forcing skb reallocation
Posted by Akinobu Mita 1 month, 1 week ago
2024年10月14日(月) 22:50 Breno Leitao <leitao@debian.org>:
> +static int __init skb_realloc_setup(char *str)
> +{
> +       return setup_fault_attr(&skb_realloc.attr, str);
> +}
> +__setup("skb_realloc=", skb_realloc_setup);

The documentation says "fail_net_force_skb_realloc=" boot option,
but this code seems to add "skb_realloc=" boot option.

I don't have a strong opinion about the naming, but I feel like
it's a bit long.  How about "fail_skb_realloc="?

The same goes for the debugfs directory name.

Reviewed-by: Akinobu Mita <akinobu.mita@gmail.com>
Re: [PATCH net-next v3] net: Implement fault injection forcing skb reallocation
Posted by Breno Leitao 1 month ago
On Sat, Oct 19, 2024 at 12:28:03AM +0900, Akinobu Mita wrote:
> 2024年10月14日(月) 22:50 Breno Leitao <leitao@debian.org>:
> > +static int __init skb_realloc_setup(char *str)
> > +{
> > +       return setup_fault_attr(&skb_realloc.attr, str);
> > +}
> > +__setup("skb_realloc=", skb_realloc_setup);
> 
> The documentation says "fail_net_force_skb_realloc=" boot option,
> but this code seems to add "skb_realloc=" boot option.
> 
> I don't have a strong opinion about the naming, but I feel like
> it's a bit long.  How about "fail_skb_realloc="?
> 
> The same goes for the debugfs directory name.
> 
> Reviewed-by: Akinobu Mita <akinobu.mita@gmail.com>

Thanks. I will update and spin a new version.
Re: [PATCH net-next v3] net: Implement fault injection forcing skb reallocation
Posted by Paolo Abeni 1 month, 1 week ago
On 10/14/24 15:50, Breno Leitao wrote:
> Introduce a fault injection mechanism to force skb reallocation. The
> primary goal is to catch bugs related to pointer invalidation after
> potential skb reallocation.
> 
> The fault injection mechanism aims to identify scenarios where callers
> retain pointers to various headers in the skb but fail to reload these
> pointers after calling a function that may reallocate the data. This
> type of bug can lead to memory corruption or crashes if the old,
> now-invalid pointers are used.
> 
> By forcing reallocation through fault injection, we can stress-test code
> paths and ensure proper pointer management after potential skb
> reallocations.
> 
> Add a hook for fault injection in the following functions:
> 
>   * pskb_trim_rcsum()
>   * pskb_may_pull_reason()
>   * pskb_trim()
> 
> As the other fault injection mechanism, protect it under a debug Kconfig
> called CONFIG_FAIL_SKB_FORCE_REALLOC.
> 
> This patch was *heavily* inspired by Jakub's proposal from:
> https://lore.kernel.org/all/20240719174140.47a868e6@kernel.org/
> 
> CC: Akinobu Mita <akinobu.mita@gmail.com>
> Suggested-by: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Breno Leitao <leitao@debian.org>

I'm sorry to nit-pick, but checkpatch laments that the new command line 
argument lacks documentation in 
Documentation/admin-guide/kernel-parameters.txt, and I feel that could 
be actually useful.

With that, feel free to include my ack in the next revision,

Thanks!

Paolo
Re: [PATCH net-next v3] net: Implement fault injection forcing skb reallocation
Posted by Breno Leitao 1 month ago
On Thu, Oct 17, 2024 at 03:50:41PM +0200, Paolo Abeni wrote:
> On 10/14/24 15:50, Breno Leitao wrote:
> > Introduce a fault injection mechanism to force skb reallocation. The
> > primary goal is to catch bugs related to pointer invalidation after
> > potential skb reallocation.
> > 
> > The fault injection mechanism aims to identify scenarios where callers
> > retain pointers to various headers in the skb but fail to reload these
> > pointers after calling a function that may reallocate the data. This
> > type of bug can lead to memory corruption or crashes if the old,
> > now-invalid pointers are used.
> > 
> > By forcing reallocation through fault injection, we can stress-test code
> > paths and ensure proper pointer management after potential skb
> > reallocations.
> > 
> > Add a hook for fault injection in the following functions:
> > 
> >   * pskb_trim_rcsum()
> >   * pskb_may_pull_reason()
> >   * pskb_trim()
> > 
> > As the other fault injection mechanism, protect it under a debug Kconfig
> > called CONFIG_FAIL_SKB_FORCE_REALLOC.
> > 
> > This patch was *heavily* inspired by Jakub's proposal from:
> > https://lore.kernel.org/all/20240719174140.47a868e6@kernel.org/
> > 
> > CC: Akinobu Mita <akinobu.mita@gmail.com>
> > Suggested-by: Jakub Kicinski <kuba@kernel.org>
> > Signed-off-by: Breno Leitao <leitao@debian.org>
> 
> I'm sorry to nit-pick, but checkpatch laments that the new command line
> argument lacks documentation in
> Documentation/admin-guide/kernel-parameters.txt, and I feel that could be
> actually useful.
> 
> With that, feel free to include my ack in the next revision,

Thanks Paolo, I will send a new updated version soon.