From nobody Fri Nov 29 18:39:13 2024 Received: from smtp-relay-internal-1.canonical.com (smtp-relay-internal-1.canonical.com [185.125.188.123]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A03FE135A54 for ; Mon, 16 Sep 2024 19:19:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=185.125.188.123 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726514352; cv=none; b=kznuX8EP4WDEY2dcSHQj2CCgoa0bfsUcE8lGuyImmLjLe2jEI0ZuXDw+SmwtKwsjCUxYV4Kg7sqpv9UFADvl7J7XO4rjSzX13IjLePZTJV8F6f8I0ZEK0tXOLHZ/BBVdadk70Tkv8ohI0CnDkZ06sME1jZYJHqJtPhcWGvwvzts= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726514352; c=relaxed/simple; bh=dxx/xw1PNASJMyzvuXTO007g8b0W3jPH5sCpqgfzh2w=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=XDr9Ti0GMfq4lW1pl3JZUUXOKoV5MG/LHsHZ0M20IKnS7HeZodVUuIUOqC+TbdiVcPncvy/KKT3G3zctSaN/SPg5mV8XP0d6/GxjgxLnl/XXIhNdCI94SxxHPo0MatwlkX6Sz2ZR59PtsOowatxbaZEyML951PgLOhDqOnX98qo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=canonical.com; spf=pass smtp.mailfrom=canonical.com; dkim=pass (2048-bit key) header.d=canonical.com header.i=@canonical.com header.b=EZWqFhpa; arc=none smtp.client-ip=185.125.188.123 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=canonical.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=canonical.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=canonical.com header.i=@canonical.com header.b="EZWqFhpa" Received: from mail-io1-f71.google.com (mail-io1-f71.google.com [209.85.166.71]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-1.canonical.com (Postfix) with ESMTPS id 1C8F23F4C0 for ; Mon, 16 Sep 2024 19:19:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1726514341; bh=g6gFW990rXgPFC7sBytf3Xuf2ZqcL0RsepNgY8wAuyI=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=EZWqFhpaNlpw5lt98TZHYumNdmaynXTtl9zxGoUAtnPiMsybYPZR02zhJGdAZbFX0 WoDbfQ9PH/nPwXcJS+lkzZHRE9UQbAuoCIc+xqfEUE/Sb6D+2o5hCUAyhd+ylUAabW wCYvSNcwMlbwvyv2suLW7L7KeAe5TGbGZlnI7ypOyVMoF/l8leS8WM5urFfOsc9bb9 EGYWzez/Zd6UiR4b7wi0/mUdT07hCkPf5UQpJ0ixiWnGtP25mQNuvDalRipzoLCnRv viEyBw1sp0t7Tw+tJi1bAA826jNjaVXHFlu5ZTUdBgCPR1Rzm0sX0zhsnhkaiLgEfa lf8v89CjK25TQ== Received: by mail-io1-f71.google.com with SMTP id ca18e2360f4ac-82ce2629e40so782721339f.2 for ; Mon, 16 Sep 2024 12:19:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726514340; x=1727119140; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=g6gFW990rXgPFC7sBytf3Xuf2ZqcL0RsepNgY8wAuyI=; b=jtCwSVonoAqwoNV3PYdcwm2yPmdwymtFW+zCvgLp5lZTr9iVKWvf7gmmfPmog5dShb DgvaLpdNaVqOMoUgy9KTVBgbQPAEkVPw1VpBhExkXkWszxlBdCK1fIiDzCNalelpPvFU qMe/sHDnWxO2UTVrvyhSkRoVBM79CmfoWfzhafCpbsc20gJLMhB4YW6gexERwreTCNGJ LtsareKUqVZBwbQ4PMfXK9CCIqDljepbjiJpTsS5PUjzXx41bTTZH1QnZkf/fcEH10pK 9Ujsx6xCSHap9tvKRbT2Icy0p5A79FLKlS2B1GasjpFUU5RLsA9m5VlB55u3qdoVlgF0 J/lQ== X-Forwarded-Encrypted: i=1; AJvYcCVBK5BZX/5aSwf+Si4qfZlQ74oOFqxiFex+pig9G6pCG3Ww1zyhKZKMwRCZuQGkRInOQc4ltwY6wOo2ch8=@vger.kernel.org X-Gm-Message-State: AOJu0Yz2gGk1IZTHoDA1mEsYVUHlAgttZp1POXLFcXZsw+Kf5a/91sw9 LtZVU9CQHeKLh0CDgZFxutWt6q1uFhmN+U7kKXKOagyoCKvpF616ArxJnN5cWSQBxyUgBTNXYon kKfKlQdw9gPQJJGUf4YWpAONkJJA9C5uT+VGFWoXUc0gHizcpmhaWsC78Sp3BU6P6ecFqtt7e2R awrQ== X-Received: by 2002:a05:6e02:198b:b0:39e:68f8:43e5 with SMTP id e9e14a558f8ab-3a0848cb45cmr155032615ab.9.1726514339362; Mon, 16 Sep 2024 12:18:59 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF3iGSfFTfJcbPf8z0clb3SrrsDlHpBRsnkXcvCJ3/glVchw8u0yNvUEi8U+rJ4/IMYmTragQ== X-Received: by 2002:a05:6e02:198b:b0:39e:68f8:43e5 with SMTP id e9e14a558f8ab-3a0848cb45cmr155032325ab.9.1726514338677; Mon, 16 Sep 2024 12:18:58 -0700 (PDT) Received: from localhost (sub55115.htc.net. [65.87.55.115]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4d37ebf4ba4sm1640978173.35.2024.09.16.12.18.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Sep 2024 12:18:58 -0700 (PDT) From: mitchell.augustin@canonical.com To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, shuah@kernel.org, netdev@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org Cc: jacob.martin@canonical.com, dann.frazier@canonical.com, mitchell@mitchellaugustin.com Subject: [PATCH net-next] selftests: net: add pmtu_bad.sh regression test Date: Mon, 16 Sep 2024 14:18:57 -0500 Message-Id: <20240916191857.1082092-1-mitchell.augustin@canonical.com> X-Mailer: git-send-email 2.34.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Mitchell Augustin Introduce a regression test that reproduces a bug in pmtu_ipv6_ipv6_exception which causes veth_A-R1's reference counter to remain > 0 after cleanup should have already completed. On affected kernels/platforms, running this test will result in the following being displayed repeatedly in dmesg: unregister_netdevice: waiting for veth_A-R1 to become free. Usage count =3D= 5 and future attempts to modprobe/rmmod ip6_vti will hang forever. Link: https://lore.kernel.org/all/CAHTA-uZDaJ-71o+bo8a96TV4ck-8niimztQFaa= =3DQoeNdUm-9wg@mail.gmail.com/ Requested-by: Jakub Kicinski Signed-off-by: Mitchell Augustin --- tools/testing/selftests/net/pmtu_bad.sh | 901 ++++++++++++++++++++++++ 1 file changed, 901 insertions(+) create mode 100755 tools/testing/selftests/net/pmtu_bad.sh diff --git a/tools/testing/selftests/net/pmtu_bad.sh b/tools/testing/selfte= sts/net/pmtu_bad.sh new file mode 100755 index 000000000000..4bbc9815b661 --- /dev/null +++ b/tools/testing/selftests/net/pmtu_bad.sh @@ -0,0 +1,901 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Pinned version of pmtu.sh pmtu_ipv6_ipv6_exception and all dependencies +# configured to reproduce a kernel bug where the veth_A-R1 device's +# resource counter remains > 0 after cleanup should have already been +# completed. +# +# On affected kernels/platforms, running this test will result in the foll= owing +# being displayed repeatedly in dmesg: +# unregister_netdevice: waiting for veth_A-R1 to become free. Usage count = =3D 5 +# +# and future attempts to modprobe/rmmod ip6_vti will hang forever. +# +# Link: https://lore.kernel.org/all/CAHTA-uZDaJ-71o+bo8a96TV4ck-8niimztQFa= a=3DQoeNdUm-9wg@mail.gmail.com/ +# +# BugLink: https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/2072501 +# +# pmtu.sh: +# Check that route PMTU values match expectations, and that initial device= MTU +# values are assigned correctly +# +# Tests currently implemented: +# +# - pmtu_bad +# Sets the CPU governor to "performance" for all CPUs, then +# runs a pinned, affected version of the pmtu_ipv6_ipv6_exception test wi= th +# nexthop objects 100 times. If this causes the following to be output +# to dmesg, the test is considered to have failed and returns an error: +# +# unregister_netdevice: waiting for veth_A-R1 to become free +# +# Otherwise, the test passes. After execution of the test, the CPU govern= or +# is restored to its original settings. +# - pmtu_ipv6_ipv6_exception +# Same as pmtu_ipv4_vxlan4, but using a IPv4/IPv6 tunnel over IPv4/IPv6, +# instead of VXLAN + +# Pinned version of lib.sh +##########################################################################= #### +# Defines + +WAIT_TIMEOUT=3D${WAIT_TIMEOUT:=3D20} +BUSYWAIT_TIMEOUT=3D$((WAIT_TIMEOUT * 1000)) # ms + +# Kselftest framework requirement - SKIP code is 4. +ksft_skip=3D4 +# namespace list created by setup_ns +NS_LIST=3D() + +##########################################################################= #### +# Helpers + +__ksft_status_merge() +{ + local a=3D$1; shift + local b=3D$1; shift + local -A weights + local weight=3D0 + + local i + for i in "$@"; do + weights[$i]=3D$((weight++)) + done + + if [[ ${weights[$a]} > ${weights[$b]} ]]; then + echo "$a" + return 0 + else + echo "$b" + return 1 + fi +} + +ksft_status_merge() +{ + local a=3D$1; shift + local b=3D$1; shift + + __ksft_status_merge "$a" "$b" \ + $ksft_pass $ksft_xfail $ksft_skip $ksft_fail +} + +ksft_exit_status_merge() +{ + local a=3D$1; shift + local b=3D$1; shift + + __ksft_status_merge "$a" "$b" \ + $ksft_xfail $ksft_pass $ksft_skip $ksft_fail +} + +loopy_wait() +{ + local sleep_cmd=3D$1; shift + local timeout_ms=3D$1; shift + + local start_time=3D"$(date -u +%s%3N)" + while true + do + local out + if out=3D$("$@"); then + echo -n "$out" + return 0 + fi + + local current_time=3D"$(date -u +%s%3N)" + if ((current_time - start_time > timeout_ms)); then + echo -n "$out" + return 1 + fi + + $sleep_cmd + done +} + +busywait() +{ + local timeout_ms=3D$1; shift + + loopy_wait : "$timeout_ms" "$@" +} + +# timeout in seconds +slowwait() +{ + local timeout_sec=3D$1; shift + + loopy_wait "sleep 0.1" "$((timeout_sec * 1000))" "$@" +} + +until_counter_is() +{ + local expr=3D$1; shift + local current=3D$("$@") + + echo $((current)) + ((current $expr)) +} + +busywait_for_counter() +{ + local timeout=3D$1; shift + local delta=3D$1; shift + + local base=3D$("$@") + busywait "$timeout" until_counter_is ">=3D $((base + delta))" "$@" +} + +slowwait_for_counter() +{ + local timeout=3D$1; shift + local delta=3D$1; shift + + local base=3D$("$@") + slowwait "$timeout" until_counter_is ">=3D $((base + delta))" "$@" +} + +cleanup_ns() +{ + local ns=3D"" + local errexit=3D0 + local ret=3D0 + + # disable errexit temporary + if [[ $- =3D~ "e" ]]; then + errexit=3D1 + set +e + fi + + for ns in "$@"; do + [ -z "${ns}" ] && continue + ip netns delete "${ns}" &> /dev/null + if ! busywait $BUSYWAIT_TIMEOUT ip netns list \| grep -vq "^$ns$" &> /de= v/null; then + echo "Warn: Failed to remove namespace $ns" + ret=3D1 + fi + done + + [ $errexit -eq 1 ] && set -e + return $ret +} + +cleanup_all_ns() +{ + cleanup_ns "${NS_LIST[@]}" +} + +# setup netns with given names as prefix. e.g +# setup_ns local remote +setup_ns() +{ + local ns=3D"" + local ns_name=3D"" + local ns_list=3D() + local ns_exist=3D + for ns_name in "$@"; do + # Some test may setup/remove same netns multi times + if unset ${ns_name} 2> /dev/null; then + ns=3D"${ns_name,,}-$(mktemp -u XXXXXX)" + eval readonly ${ns_name}=3D"$ns" + ns_exist=3Dfalse + else + eval ns=3D'$'${ns_name} + cleanup_ns "$ns" + ns_exist=3Dtrue + fi + + if ! ip netns add "$ns"; then + echo "Failed to create namespace $ns_name" + cleanup_ns "${ns_list[@]}" + return $ksft_skip + fi + ip -n "$ns" link set lo up + ! $ns_exist && ns_list+=3D("$ns") + done + NS_LIST+=3D("${ns_list[@]}") +} + +tc_rule_stats_get() +{ + local dev=3D$1; shift + local pref=3D$1; shift + local dir=3D$1; shift + local selector=3D${1:-.packets}; shift + + tc -j -s filter show dev $dev ${dir:-ingress} pref $pref \ + | jq ".[1].options.actions[].stats$selector" +} + +tc_rule_handle_stats_get() +{ + local id=3D$1; shift + local handle=3D$1; shift + local selector=3D${1:-.packets}; shift + local netns=3D${1:-""}; shift + + tc $netns -j -s filter show $id \ + | jq ".[] | select(.options.handle =3D=3D $handle) | \ + .options.actions[0].stats$selector" +} + +# Pinned version of net_helper.sh +wait_local_port_listen() +{ + local listener_ns=3D"${1}" + local port=3D"${2}" + local protocol=3D"${3}" + local pattern + local i + + pattern=3D":$(printf "%04X" "${port}") " + + # for tcp protocol additionally check the socket state + [ ${protocol} =3D "tcp" ] && pattern=3D"${pattern}0A" + for i in $(seq 10); do + if ip netns exec "${listener_ns}" awk '{print $2" "$4}' \ + /proc/net/"${protocol}"* | grep -q "${pattern}"; then + break + fi + sleep 0.1 + done +} + + + +PAUSE_ON_FAIL=3Dno +VERBOSE=3D0 +TRACING=3D0 + +# Some systems don't have a ping6 binary anymore +which ping6 > /dev/null 2>&1 && ping6=3D$(which ping6) || ping6=3D$(which = ping) + +# Name Description = re-run with nh +tests=3D" + pmtu_bad Runs IPv6 over IPv6: PMTU exceptions 100x w/ performance governo= r 1" + +# Addressing and routing for tests with routers: four network segments, wi= th +# index SEGMENT between 1 and 4, a common prefix (PREFIX4 or PREFIX6) and = an +# identifier ID, which is 1 for hosts (A and B), 2 for routers (R1 and R2). +# Addresses are: +# - IPv4: PREFIX4.SEGMENT.ID (/24) +# - IPv6: PREFIX6:SEGMENT::ID (/64) +prefix4=3D"10.0" +prefix6=3D"fc00" +a_r1=3D1 +a_r2=3D2 +b_r1=3D3 +b_r2=3D4 +# ns peer segment +routing_addrs=3D" + A R1 ${a_r1} + A R2 ${a_r2} + B R1 ${b_r1} + B R2 ${b_r2} +" +# Traffic from A to B goes through R1 by default, and through R2, if desti= ned to +# B's address on the b_r2 segment. +# Traffic from B to A goes through R1. +# ns destination gateway +routes=3D" + A default ${prefix4}.${a_r1}.2 + A ${prefix4}.${b_r2}.1 ${prefix4}.${a_r2}.2 + B default ${prefix4}.${b_r1}.2 + + A default ${prefix6}:${a_r1}::2 + A ${prefix6}:${b_r2}::1 ${prefix6}:${a_r2}::2 + B default ${prefix6}:${b_r1}::2 +" +USE_NH=3D"no" +# ns family nh id destination gateway +nexthops=3D" + A 4 41 ${prefix4}.${a_r1}.2 veth_A-R1 + A 4 42 ${prefix4}.${a_r2}.2 veth_A-R2 + B 4 41 ${prefix4}.${b_r1}.2 veth_B-R1 + + A 6 61 ${prefix6}:${a_r1}::2 veth_A-R1 + A 6 62 ${prefix6}:${a_r2}::2 veth_A-R2 + B 6 61 ${prefix6}:${b_r1}::2 veth_B-R1 +" + +# nexthop id correlates to id in nexthops config above +# ns family prefix nh id +routes_nh=3D" + A 4 default 41 + A 4 ${prefix4}.${b_r2}.1 42 + B 4 default 41 + + A 6 default 61 + A 6 ${prefix6}:${b_r2}::1 62 + B 6 default 61 +" + +policy_mark=3D0x04 +rt_table=3Dmain + +veth4_a_addr=3D"192.168.1.1" +veth4_b_addr=3D"192.168.1.2" +veth4_c_addr=3D"192.168.2.10" +veth4_mask=3D"24" +veth6_a_addr=3D"fd00:1::a" +veth6_b_addr=3D"fd00:1::b" +veth6_c_addr=3D"fd00:2::c" +veth6_mask=3D"64" + +tunnel4_a_addr=3D"192.168.2.1" +tunnel4_b_addr=3D"192.168.2.2" +tunnel4_mask=3D"24" +tunnel6_a_addr=3D"fd00:2::a" +tunnel6_b_addr=3D"fd00:2::b" +tunnel6_mask=3D"64" + +dummy6_0_prefix=3D"fc00:1000::" +dummy6_1_prefix=3D"fc00:1001::" +dummy6_mask=3D"64" + +err_buf=3D +tcpdump_pids=3D +nettest_pids=3D +socat_pids=3D +tmpoutfile=3D + +err() { + err_buf=3D"${err_buf}${1} +" +} + +err_flush() { + echo -n "${err_buf}" + err_buf=3D +} + +run_cmd() { + cmd=3D"$*" + + if [ "$VERBOSE" =3D "1" ]; then + printf " COMMAND: $cmd\n" + fi + + out=3D"$($cmd 2>&1)" + rc=3D$? + if [ "$VERBOSE" =3D "1" -a -n "$out" ]; then + echo " $out" + echo + fi + + return $rc +} + +run_cmd_bg() { + cmd=3D"$*" + + if [ "$VERBOSE" =3D "1" ]; then + printf " COMMAND: %s &\n" "${cmd}" + fi + + $cmd 2>&1 & +} + +# Find the auto-generated name for this namespace +nsname() { + eval echo \$NS_$1 +} + +setup_ipvX_over_ipvY() { + inner=3D${1} + outer=3D${2} + + if [ "${outer}" -eq 4 ]; then + a_addr=3D"${prefix4}.${a_r1}.1" + b_addr=3D"${prefix4}.${b_r1}.1" + if [ "${inner}" -eq 4 ]; then + type=3D"ipip" + mode=3D"ipip" + else + type=3D"sit" + mode=3D"ip6ip" + fi + else + a_addr=3D"${prefix6}:${a_r1}::1" + b_addr=3D"${prefix6}:${b_r1}::1" + type=3D"ip6tnl" + if [ "${inner}" -eq 4 ]; then + mode=3D"ipip6" + else + mode=3D"ip6ip6" + fi + fi + + run_cmd ${ns_a} ip link add ip_a type ${type} local ${a_addr} remote ${b_= addr} mode ${mode} || return $ksft_skip + run_cmd ${ns_b} ip link add ip_b type ${type} local ${b_addr} remote ${a_= addr} mode ${mode} + + run_cmd ${ns_a} ip link set ip_a up + run_cmd ${ns_b} ip link set ip_b up + + if [ "${inner}" =3D "4" ]; then + run_cmd ${ns_a} ip addr add ${tunnel4_a_addr}/${tunnel4_mask} dev ip_a + run_cmd ${ns_b} ip addr add ${tunnel4_b_addr}/${tunnel4_mask} dev ip_b + else + run_cmd ${ns_a} ip addr add ${tunnel6_a_addr}/${tunnel6_mask} dev ip_a + run_cmd ${ns_b} ip addr add ${tunnel6_b_addr}/${tunnel6_mask} dev ip_b + fi +} + +setup_ip6ip6() { + setup_ipvX_over_ipvY 6 6 +} + +setup_namespaces() { + setup_ns NS_A NS_B NS_C NS_R1 NS_R2 + for n in ${NS_A} ${NS_B} ${NS_C} ${NS_R1} ${NS_R2}; do + # Disable DAD, so that we don't have to wait to use the + # configured IPv6 addresses + ip netns exec ${n} sysctl -q net/ipv6/conf/default/accept_dad=3D0 + done + ns_a=3D"ip netns exec ${NS_A}" + ns_b=3D"ip netns exec ${NS_B}" + ns_c=3D"ip netns exec ${NS_C}" + ns_r1=3D"ip netns exec ${NS_R1}" + ns_r2=3D"ip netns exec ${NS_R2}" +} + + +setup_routing_old() { + for i in ${routes}; do + [ "${ns}" =3D "" ] && ns=3D"${i}" && continue + [ "${addr}" =3D "" ] && addr=3D"${i}" && continue + [ "${gw}" =3D "" ] && gw=3D"${i}" + + ns_name=3D"$(nsname ${ns})" + + ip -n "${ns_name}" route add "${addr}" table "${rt_table}" via "${gw}" + + ns=3D""; addr=3D""; gw=3D"" + done +} + +setup_routing_new() { + for i in ${nexthops}; do + [ "${ns}" =3D "" ] && ns=3D"${i}" && continue + [ "${fam}" =3D "" ] && fam=3D"${i}" && continue + [ "${nhid}" =3D "" ] && nhid=3D"${i}" && continue + [ "${gw}" =3D "" ] && gw=3D"${i}" && continue + [ "${dev}" =3D "" ] && dev=3D"${i}" + + ns_name=3D"$(nsname ${ns})" + + ip -n ${ns_name} -${fam} nexthop add id ${nhid} via ${gw} dev ${dev} + + ns=3D""; fam=3D""; nhid=3D""; gw=3D""; dev=3D"" + + done + + for i in ${routes_nh}; do + [ "${ns}" =3D "" ] && ns=3D"${i}" && continue + [ "${fam}" =3D "" ] && fam=3D"${i}" && continue + [ "${addr}" =3D "" ] && addr=3D"${i}" && continue + [ "${nhid}" =3D "" ] && nhid=3D"${i}" + + ns_name=3D"$(nsname ${ns})" + + ip -n "${ns_name}" -"${fam}" route add "${addr}" table "${rt_table}" nhi= d "${nhid}" + + ns=3D""; fam=3D""; addr=3D""; nhid=3D"" + done +} + +setup_routing() { + for i in ${NS_R1} ${NS_R2}; do + ip netns exec ${i} sysctl -q net/ipv4/ip_forward=3D1 + ip netns exec ${i} sysctl -q net/ipv6/conf/all/forwarding=3D1 + done + + for i in ${routing_addrs}; do + [ "${ns}" =3D "" ] && ns=3D"${i}" && continue + [ "${peer}" =3D "" ] && peer=3D"${i}" && continue + [ "${segment}" =3D "" ] && segment=3D"${i}" + + ns_name=3D"$(nsname ${ns})" + peer_name=3D"$(nsname ${peer})" + if=3D"veth_${ns}-${peer}" + ifpeer=3D"veth_${peer}-${ns}" + + # Create veth links + ip link add ${if} up netns ${ns_name} type veth peer name ${ifpeer} netn= s ${peer_name} || return 1 + ip -n ${peer_name} link set dev ${ifpeer} up + + # Add addresses + ip -n ${ns_name} addr add ${prefix4}.${segment}.1/24 dev ${if} + ip -n ${ns_name} addr add ${prefix6}:${segment}::1/64 dev ${if} + + ip -n ${peer_name} addr add ${prefix4}.${segment}.2/24 dev ${ifpeer} + ip -n ${peer_name} addr add ${prefix6}:${segment}::2/64 dev ${ifpeer} + + ns=3D""; peer=3D""; segment=3D"" + done + + if [ "$USE_NH" =3D "yes" ]; then + setup_routing_new + else + setup_routing_old + fi + + return 0 +} + + +setup() { + [ "$(id -u)" -ne 0 ] && echo " need to run as root" && return $ksft_skip + + for arg do + eval setup_${arg} || { echo " ${arg} not supported"; return 1; } + done +} + +trace() { + [ $TRACING -eq 0 ] && return + + for arg do + [ "${ns_cmd}" =3D "" ] && ns_cmd=3D"${arg}" && continue + ${ns_cmd} tcpdump --immediate-mode -s 0 -i "${arg}" -w "${name}_${arg}.p= cap" 2> /dev/null & + tcpdump_pids=3D"${tcpdump_pids} $!" + ns_cmd=3D + done + sleep 1 +} + + +restore_governors() { + echo "Restoring original CPU governors" + while IFS=3D' ' read -r cpu governor; do + echo "$governor" | tee "/sys/devices/system/cpu/$cpu/cpufreq/scaling_gov= ernor" > /dev/null + done < "$GOVERNOR_STATE_FILE" + rm -rf "$STATE_FILE_DIR" 2>/dev/null +} + + +cleanup() { + for pid in ${tcpdump_pids}; do + kill ${pid} + done + tcpdump_pids=3D + + for pid in ${nettest_pids}; do + kill ${pid} + done + nettest_pids=3D + + for pid in ${socat_pids}; do + kill "${pid}" + done + socat_pids=3D + + cleanup_all_ns + + ip link del veth_A-C 2>/dev/null + ip link del veth_A-R1 2>/dev/null + ovs-vsctl --if-exists del-port vxlan_a 2>/dev/null + ovs-vsctl --if-exists del-br ovs_br0 2>/dev/null + rm -f "$tmpoutfile" +} + +mtu() { + ns_cmd=3D"${1}" + dev=3D"${2}" + mtu=3D"${3}" + + ${ns_cmd} ip link set dev ${dev} mtu ${mtu} +} + +mtu_parse() { + input=3D"${1}" + + next=3D0 + for i in ${input}; do + [ ${next} -eq 1 -a "${i}" =3D "lock" ] && next=3D2 && continue + [ ${next} -eq 1 ] && echo "${i}" && return + [ ${next} -eq 2 ] && echo "lock ${i}" && return + [ "${i}" =3D "mtu" ] && next=3D1 + done +} + +link_get() { + ns_cmd=3D"${1}" + name=3D"${2}" + + ${ns_cmd} ip link show dev "${name}" +} + +link_get_mtu() { + ns_cmd=3D"${1}" + name=3D"${2}" + + mtu_parse "$(link_get "${ns_cmd}" ${name})" +} + +route_get_dst_exception() { + ns_cmd=3D"${1}" + dst=3D"${2}" + dsfield=3D"${3}" + + if [ -z "${dsfield}" ]; then + dsfield=3D0 + fi + + ${ns_cmd} ip route get "${dst}" dsfield "${dsfield}" +} + +route_get_dst_pmtu_from_exception() { + ns_cmd=3D"${1}" + dst=3D"${2}" + dsfield=3D"${3}" + + mtu_parse "$(route_get_dst_exception "${ns_cmd}" "${dst}" "${dsfield}")" +} + +check_pmtu_value() { + expected=3D"${1}" + value=3D"${2}" + event=3D"${3}" + + [ "${expected}" =3D "any" ] && [ -n "${value}" ] && return 0 + [ "${value}" =3D "${expected}" ] && return 0 + [ -z "${value}" ] && err " PMTU exception wasn't created after ${even= t}" && return 1 + [ -z "${expected}" ] && err " PMTU exception shouldn't exist after ${eve= nt}" && return 1 + err " found PMTU exception with incorrect MTU ${value}, expected ${expec= ted}, after ${event}" + return 1 +} + + +test_pmtu_ipvX_over_ipvY_exception() { + inner=3D${1} + outer=3D${2} + ll_mtu=3D4000 + + setup namespaces routing ip${inner}ip${outer} || return $ksft_skip + + trace "${ns_a}" ip_a "${ns_b}" ip_b \ + "${ns_a}" veth_A-R1 "${ns_r1}" veth_R1-A \ + "${ns_b}" veth_B-R1 "${ns_r1}" veth_R1-B + + if [ ${inner} -eq 4 ]; then + ping=3Dping + dst=3D${tunnel4_b_addr} + else + ping=3D${ping6} + dst=3D${tunnel6_b_addr} + fi + + if [ ${outer} -eq 4 ]; then + # IPv4 header + exp_mtu=3D$((${ll_mtu} - 20)) + else + # IPv6 header Option 4 + exp_mtu=3D$((${ll_mtu} - 40 - 8)) + fi + + # Create route exception by exceeding link layer MTU + mtu "${ns_a}" veth_A-R1 $((${ll_mtu} + 1000)) + mtu "${ns_r1}" veth_R1-A $((${ll_mtu} + 1000)) + mtu "${ns_b}" veth_B-R1 ${ll_mtu} + mtu "${ns_r1}" veth_R1-B ${ll_mtu} + + mtu "${ns_a}" ip_a $((${ll_mtu} + 1000)) || return + mtu "${ns_b}" ip_b $((${ll_mtu} + 1000)) || return + run_cmd ${ns_a} ${ping} -q -M want -i 0.1 -w 1 -s $((${ll_mtu} + 500)) ${= dst} + + # Check that exception was created + pmtu=3D"$(route_get_dst_pmtu_from_exception "${ns_a}" ${dst})" + check_pmtu_value ${exp_mtu} "${pmtu}" "exceeding link layer MTU on ip${in= ner}ip${outer} interface" +} + +test_pmtu_ipv6_ipv6_exception() { + test_pmtu_ipvX_over_ipvY_exception 6 6 +} + +run_test() { + ( + tname=3D"$1" + tdesc=3D"$2" + + unset IFS + + # Since cleanup() relies on variables modified by this subshell, it + # has to run in this context. + trap cleanup EXIT + + if [ "$VERBOSE" =3D "1" ]; then + printf "\n##############################################################= ############\n\n" + fi + + eval test_${tname} + ret=3D$? + + if [ $ret -eq 0 ]; then + printf "TEST: %-60s [ OK ]\n" "${tdesc}" + elif [ $ret -eq 1 ]; then + printf "TEST: %-60s [FAIL]\n" "${tdesc}" + if [ "${PAUSE_ON_FAIL}" =3D "yes" ]; then + echo + echo "Pausing. Hit enter to continue" + read a + fi + err_flush + exit 1 + elif [ $ret -eq $ksft_skip ]; then + printf "TEST: %-60s [SKIP]\n" "${tdesc}" + err_flush + fi + + return $ret + ) + ret=3D$? + case $ret in + 0) + all_skipped=3Dfalse + [ $exitcode -eq $ksft_skip ] && exitcode=3D0 + ;; + $ksft_skip) + [ $all_skipped =3D true ] && exitcode=3D$ksft_skip + ;; + *) + all_skipped=3Dfalse + exitcode=3D1 + ;; + esac + + return $ret +} + +run_test_nh() { + tname=3D"$1" + tdesc=3D"$2" + + USE_NH=3Dyes + run_test "${tname}" "${tdesc} - nexthop objects" + USE_NH=3Dno +} + +usage() { + echo + echo "$0 [OPTIONS]..." + echo "Runs pmtu_ipv6_ipv6_exception test 100x" + echo + echo "Options" + echo " --trace: capture traffic to TEST_INTERFACE.pcap" + echo + echo "Available tests${tests}" + exit 1 +} + +##########################################################################= ###### +# + +[ "$(id -u)" -ne 0 ] && echo "ERROR: need to run as root" && exit 1 +exitcode=3D0 +desc=3D0 +all_skipped=3Dtrue + +while getopts :ptv o +do + case $o in + p) PAUSE_ON_FAIL=3Dyes;; + v) VERBOSE=3D1;; + t) if which tcpdump > /dev/null 2>&1; then + TRACING=3D1 + else + echo "=3D=3D=3D tcpdump not available, tracing disabled" + fi + ;; + *) usage;; + esac +done +shift $(($OPTIND-1)) + +IFS=3D" +" + +for arg do + # Check first that all requested tests are available before running any + command -v > /dev/null "test_${arg}" || { echo "=3D=3D=3D Test ${arg} not= found"; usage; } +done + +trap cleanup EXIT + +# start clean +cleanup + +HAVE_NH=3Dno +ip nexthop ls >/dev/null 2>&1 +[ $? -eq 0 ] && HAVE_NH=3Dyes + +name=3D"" +desc=3D"" +rerun_nh=3D0 + +# Set to performance governor so we can reproduce the bug +STATE_FILE_DIR=3D$(mktemp -d) +if [[ ! -d "$STATE_FILE_DIR" ]]; then + echo "Could not create temp dir, skipping this test" >&2 + exit 1 +fi + +# Save current CPU governor state to temp file +GOVERNOR_STATE_FILE=3D"$STATE_FILE_DIR/orig_cpu_governors.txt" +touch $GOVERNOR_STATE_FILE +if [[ ! -e "$GOVERNOR_STATE_FILE" ]]; then + echo "Could not save current performance governor states, skipping this t= est" >&2 + exit 1 +fi +echo "Saving current performance governor settings" +for cpu in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do + if [[ $cpu =3D~ ^/sys/devices/system/cpu/(cpu[0-9]+)/cpufreq/scaling_gove= rnor$ ]]; then + echo "${BASH_REMATCH[1]} $(cat $cpu)" >> "$GOVERNOR_STATE_FILE" + fi +done + +# Restore original CPU governor settings +trap restore_governors EXIT + +echo "Switching CPU governor to performance" +for cpu in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do + if [[ $cpu =3D~ ^/sys/devices/system/cpu/(cpu[0-9]+)/cpufreq/scaling_gove= rnor$ ]]; then + echo "performance" | tee $cpu > /dev/null + fi +done + +for i in $(seq 1 100); +do + name=3D"pmtu_ipv6_ipv6_exception" + desc=3D"Bad PMTU behavior" + + if [ "${HAVE_NH}" =3D "yes" ]; then + rerun_nh=3D"${t}" + fi + + run_this=3D1 + for arg do + [ "${arg}" !=3D "${arg#--*}" ] && continue + [ "${arg}" =3D "${name}" ] && run_this=3D1 && break + run_this=3D0 + done + if [ $run_this -eq 1 ]; then + run_test_nh "${name}" "${desc}" + fi + name=3D"" + desc=3D"" + rerun_nh=3D0 +done + +if dmesg | grep -q "unregister_netdevice: waiting for veth_A-R1 to become = free"; then + printf "TEST: Bad PMTU behavior - veth_A-R1 refcount error reproducer [F= AIL]\n" + printf "veth_A-R1 has not been released properly\n" + exitcode=3D1 +else + printf "TEST: Bad PMTU behavior - veth_A-R1 refcount error reproducer [P= ASS]\n" + printf "No veth_A-R1 errors, considering this test passed\n" + exitcode=3D0 +fi + +exit ${exitcode} --=20 2.43.0