[v1] netconsole: Fix userdata race condition

[PATCH net 1/2] selftests: netconsole: Add race condition test for userdata corruption

Posted by Gustavo Luiz Duarte 3 months, 2 weeks ago

Add a test to verify that netconsole userdata handling is properly
synchronized under concurrent read/write operations. The test creates
two competing loops: one continuously sending netconsole messages
(which read userdata), and another rapidly alternating userdata values
between two distinct 198-byte patterns filled with 'A' and 'B'
characters.

Without proper synchronization, concurrent reads and writes could result
in torn reads where a message contains mixed userdata (e.g., starting
with 'A' but containing 'B', or vice versa). The test monitors 10,000
messages and fails if it detects any such corruption, ensuring that the
netconsole implementation maintains data consistency through proper
locking mechanisms.

This test validates the fix for potential race conditions in the
netconsole userdata path and serves as a regression test to prevent
similar issues in the future.

Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com>
---
 .../selftests/drivers/net/netcons_race_userdata.sh | 87 ++++++++++++++++++++++
 1 file changed, 87 insertions(+)

diff --git a/tools/testing/selftests/drivers/net/netcons_race_userdata.sh b/tools/testing/selftests/drivers/net/netcons_race_userdata.sh
new file mode 100755
index 0000000000000..d6574f0364ead
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/netcons_race_userdata.sh
@@ -0,0 +1,87 @@
+#!/usr/bin/env bash
+# SPDX-License-Identifier: GPL-2.0
+
+# This test verifies that netconsole userdata remains consistent under concurrent
+# read/write operations. It creates two loops: one continuously writing netconsole
+# messages (which read userdata) and another rapidly alternating userdata values
+# between two distinct patterns. The test checks that no message contains corrupted
+# or mixed userdata, ensuring proper synchronization in the netconsole implementation.
+#
+# Author: Gustavo Luiz Duarte <gustavold@gmail.com>
+
+set -euo pipefail
+
+SCRIPTDIR=$(dirname "$(readlink -e "${BASH_SOURCE[0]}")")
+
+source "${SCRIPTDIR}"/lib/sh/lib_netcons.sh
+
+function loop_set_userdata() {
+	MSGA=$(printf 'A%.0s' {1..198})
+	MSGB=$(printf 'B%.0s' {1..198})
+
+	while true; do
+		echo "$MSGA" > "${NETCONS_PATH}/userdata/${USERDATA_KEY}/value"
+		echo "$MSGB" > "${NETCONS_PATH}/userdata/${USERDATA_KEY}/value"
+	done
+}
+
+function loop_print_msg() {
+	while true; do
+		echo "test msg" > /dev/kmsg
+	done
+}
+
+cleanup_children() {
+	pkill_socat
+	# Remove the namespace, interfaces and netconsole target
+	cleanup
+	kill $child1 $child2 2> /dev/null || true
+	wait $child1 $child2 2> /dev/null || true
+}
+
+modprobe netdevsim 2> /dev/null || true
+modprobe netconsole 2> /dev/null || true
+
+OUTPUT_FILE="stdout"
+# Check for basic system dependency and exit if not found
+check_for_dependencies
+# Set current loglevel to KERN_INFO(6), and default to KERN_NOTICE(5)
+echo "6 5" > /proc/sys/kernel/printk
+# kill child processes and remove interfaces on exit
+trap cleanup_children EXIT
+
+# Create one namespace and two interfaces
+set_network
+# Create a dynamic target for netconsole
+create_dynamic_target
+# Set userdata "key" with the "value" value
+set_user_data
+
+# Start userdata read loop (printk)
+loop_print_msg &
+child1=$!
+
+# Start userdata write loop
+loop_set_userdata &
+child2=$!
+
+# Start socat to listen for netconsole messages and check for corrupted userdata.
+MAX_COUNT=10000
+i=0
+while read line; do
+	if [ $i -ge $MAX_COUNT ]; then
+		echo "Test passed."
+		exit ${ksft_pass}
+	fi
+
+	if [[ "$line" == "key=A"* && "$line" == *"B"* ||
+	      "$line" == "key=B"* && "$line" == *"A"* ]]; then
+		echo "Test failed. Found corrupted userdata: $line"
+		exit ${ksft_fail}
+	fi
+
+	i=$((i + 1))
+done < <(listen_port_and_save_to ${OUTPUT_FILE} 2> /dev/null)
+
+echo "socat died before we could check $MAX_COUNT messages. Skipping test. ${ksft_skip}"
+exit ${ksft_skip}

-- 
2.47.3

Re: [PATCH net 1/2] selftests: netconsole: Add race condition test for userdata corruption

Posted by Andre Carvalho 3 months, 2 weeks ago

Hi Gustavo,

On Mon, Oct 20, 2025 at 02:22:34PM -0700, Gustavo Luiz Duarte wrote:
> This test validates the fix for potential race conditions in the
> netconsole userdata path and serves as a regression test to prevent
> similar issues in the future.

I noticed the test was not added to the TEST_PROGS in the Makefile like other
selftests. Is that intentional? 

You might also need to change the order of the patches in the series to
make sure the test passes in CI.

> +cleanup_children() {
> +	pkill_socat
> +	# Remove the namespace, interfaces and netconsole target
> +	cleanup
> +	kill $child1 $child2 2> /dev/null || true
> +	wait $child1 $child2 2> /dev/null || true
> +}

Calling cleanup before stopping loop_set_userdata causes writing the userdata to
fail. You might want to move the kill and wait lines to before call to cleanup.
Additionally, shellcheck also suggests wrapping $child1 and $child2 with double
quotes.

Re: [PATCH net 1/2] selftests: netconsole: Add race condition test for userdata corruption

Posted by Gustavo Luiz Duarte 3 months, 2 weeks ago

On Mon, Oct 20, 2025 at 8:14 PM Andre Carvalho <asantostc@gmail.com> wrote:
>
> Hi Gustavo,
>
> On Mon, Oct 20, 2025 at 02:22:34PM -0700, Gustavo Luiz Duarte wrote:
> > This test validates the fix for potential race conditions in the
> > netconsole userdata path and serves as a regression test to prevent
> > similar issues in the future.
>
> I noticed the test was not added to the TEST_PROGS in the Makefile like other
> selftests. Is that intentional?
>
> You might also need to change the order of the patches in the series to
> make sure the test passes in CI.
>
> > +cleanup_children() {
> > +     pkill_socat
> > +     # Remove the namespace, interfaces and netconsole target
> > +     cleanup
> > +     kill $child1 $child2 2> /dev/null || true
> > +     wait $child1 $child2 2> /dev/null || true
> > +}
>
> Calling cleanup before stopping loop_set_userdata causes writing the userdata to
> fail. You might want to move the kill and wait lines to before call to cleanup.
> Additionally, shellcheck also suggests wrapping $child1 and $child2 with double
> quotes.

Thanks for reviewing, Andre! I'm sending v2 with your suggestions.