net/netfilter/xt_TCPMSS.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-)
Padding TCP options with NOPs is optional, so it is legal to send an
MSS option that is not aligned to a word boundary and therefore not
aligned for checksum calculation. The current TCPMSS target is not
robust to this: when the MSS option is unaligned it produces an
invalid checksum, and the packet is dropped.
This has not been observed in any real environment. Senders place the
MSS at the beginning of the options block, where it is naturally
aligned, but the spec allows unaligned options and the kernel shouldn't
silently drop legal packets.
When the changed word is not aligned, the modified bytes straddle two
checksum words, and using the standard incremental update helper
(which assumes alignment) produces an invalid checksum:
| w1 | w2 |
OLD | a b | c d |
NEW | a b' | c' d |
Since b' and c' sit across w1 and w2, we could compute the incremental
checksum in two operations by recalculating w1 and then w2:
C' = C - w1 + w1' - w2 + w2'
But working it out:
C' = C - w1 - w2 + w1' + w2'
= C - (a * 2^8 + b) - (c * 2^8 + d)
+ (a * 2^8 + b') + (c' * 2^8 + d)
= C + 2^8 * (a - a + c' - c) + (b' - b + d - d)
= C + 2^8 * (c' - c) + (b' - b)
= C - (2^8 * c + b) + (2^8 * c' + b')
So an unaligned incremental checksum can be done in a single operation
by byteswapping the changed bytes before passing them to the helper.
This patch implements that trick for unaligned MSS option updates.
Signed-off-by: Kacper Kokot <kacper.kokot.44@gmail.com>
---
I decided to go with the get_unaligned_be16 suggestion because
it's idiomatic and it produces shorter assembly on x86-64
(6 instructions vs 9). SYN processing is a cold path so
I didn't look into it further.
v2:
- Use get_unaligned_be16 (Fernando's suggestion)
- Fix alignment check expression (David)
- Mention it's a theoretical bug in the commit message
- Drop cc stable, the bug is only theoretical
net/netfilter/xt_TCPMSS.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/xt_TCPMSS.c b/net/netfilter/xt_TCPMSS.c
index 80e1634bc51f..32c87a520361 100644
--- a/net/netfilter/xt_TCPMSS.c
+++ b/net/netfilter/xt_TCPMSS.c
@ -117,8 +117,9 @@ tcpmss_mangle_packet(struct sk_buff *skb,
for (i = sizeof(struct tcphdr); i <= tcp_hdrlen - TCPOLEN_MSS; i += optlen(opt, i)) {
if (opt[i] == TCPOPT_MSS && opt[i+1] == TCPOLEN_MSS) {
u_int16_t oldmss;
+ u16 csum_oldmss, csum_newmss;
- oldmss = (opt[i+2] << 8) | opt[i+3];
+ oldmss = get_unaligned_be16(&opt[i+2]);
/* Never increase MSS, even when setting it, as
* doing so results in problems for hosts that rely
@@ -130,8 +131,19 @@ tcpmss_mangle_packet(struct sk_buff *skb,
opt[i+2] = (newmss & 0xff00) >> 8;
opt[i+3] = newmss & 0x00ff;
+ csum_oldmss = htons(oldmss);
+ csum_newmss = htons(newmss);
+
+ /* MSS may be unaligned; fix up the incremental checksum
+ * to avoid an invalid checksum and a dropped packet.
+ */
+ if (((char *)&opt[i + 2] - (char *)tcph) & 0x1) {
+ csum_oldmss = swab16(csum_oldmss);
+ csum_newmss = swab16(csum_newmss);
+ }
+
inet_proto_csum_replace2(&tcph->check, skb,
- htons(oldmss), htons(newmss),
+ csum_oldmss, csum_newmss,
false);
return 0;
}
--
2.43.0
On 5/29/26 12:34 AM, Kacper Kokot wrote:
> Padding TCP options with NOPs is optional, so it is legal to send an
> MSS option that is not aligned to a word boundary and therefore not
> aligned for checksum calculation. The current TCPMSS target is not
> robust to this: when the MSS option is unaligned it produces an
> invalid checksum, and the packet is dropped.
>
> This has not been observed in any real environment. Senders place the
> MSS at the beginning of the options block, where it is naturally
> aligned, but the spec allows unaligned options and the kernel shouldn't
> silently drop legal packets.
>
> When the changed word is not aligned, the modified bytes straddle two
> checksum words, and using the standard incremental update helper
> (which assumes alignment) produces an invalid checksum:
>
> | w1 | w2 |
> OLD | a b | c d |
> NEW | a b' | c' d |
>
> Since b' and c' sit across w1 and w2, we could compute the incremental
> checksum in two operations by recalculating w1 and then w2:
>
> C' = C - w1 + w1' - w2 + w2'
>
> But working it out:
>
> C' = C - w1 - w2 + w1' + w2'
> = C - (a * 2^8 + b) - (c * 2^8 + d)
> + (a * 2^8 + b') + (c' * 2^8 + d)
> = C + 2^8 * (a - a + c' - c) + (b' - b + d - d)
> = C + 2^8 * (c' - c) + (b' - b)
> = C - (2^8 * c + b) + (2^8 * c' + b')
>
> So an unaligned incremental checksum can be done in a single operation
> by byteswapping the changed bytes before passing them to the helper.
> This patch implements that trick for unaligned MSS option updates.
>
> Signed-off-by: Kacper Kokot <kacper.kokot.44@gmail.com>
> ---
Just a couple of nits..
Please use nf-next target, like "[PATCH nf-next v3] netfilter:
xt_TCPMSS: ...". Let's handle this as an enhancement.
> I decided to go with the get_unaligned_be16 suggestion because
> it's idiomatic and it produces shorter assembly on x86-64
> (6 instructions vs 9). SYN processing is a cold path so
> I didn't look into it further.
>
> v2:
> - Use get_unaligned_be16 (Fernando's suggestion)
> - Fix alignment check expression (David)
> - Mention it's a theoretical bug in the commit message
> - Drop cc stable, the bug is only theoretical
>
> net/netfilter/xt_TCPMSS.c | 16 ++++++++++++++--
> 1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/net/netfilter/xt_TCPMSS.c b/net/netfilter/xt_TCPMSS.c
> index 80e1634bc51f..32c87a520361 100644
> --- a/net/netfilter/xt_TCPMSS.c
> +++ b/net/netfilter/xt_TCPMSS.c
> @ -117,8 +117,9 @@ tcpmss_mangle_packet(struct sk_buff *skb,
> for (i = sizeof(struct tcphdr); i <= tcp_hdrlen - TCPOLEN_MSS; i += optlen(opt, i)) {
> if (opt[i] == TCPOPT_MSS && opt[i+1] == TCPOLEN_MSS) {
> u_int16_t oldmss;
> + u16 csum_oldmss, csum_newmss;
Please use reversed xmas tree:
+ u16 csum_oldmss, csum_newmss;
u_int16_t oldmss;
Thanks,
Fernando.
Will target nf-next and switch the variable declaration order in v3. Thanks for the review.
Hi,
On Thu, May 28, 2026 at 11:34:11PM +0100, Kacper Kokot wrote:
> Padding TCP options with NOPs is optional, so it is legal to send an
> MSS option that is not aligned to a word boundary and therefore not
> aligned for checksum calculation. The current TCPMSS target is not
> robust to this: when the MSS option is unaligned it produces an
> invalid checksum, and the packet is dropped.
Yes, but how many stacks do this?
> This has not been observed in any real environment.
... then why is this a fix?
> Senders place the MSS at the beginning of the options block, where
> it is naturally aligned, but the spec allows unaligned options and
> the kernel shouldn't silently drop legal packets.
This is questionably a "clean packet".
And "the kernel is not silently dropping anything, it is policy that
would drop it", and there are mechanisms to track what rule is
dropping this packet.
> When the changed word is not aligned, the modified bytes straddle two
> checksum words, and using the standard incremental update helper
> (which assumes alignment) produces an invalid checksum:
>
> | w1 | w2 |
> OLD | a b | c d |
> NEW | a b' | c' d |
>
> Since b' and c' sit across w1 and w2, we could compute the incremental
> checksum in two operations by recalculating w1 and then w2:
>
> C' = C - w1 + w1' - w2 + w2'
>
> But working it out:
>
> C' = C - w1 - w2 + w1' + w2'
> = C - (a * 2^8 + b) - (c * 2^8 + d)
> + (a * 2^8 + b') + (c' * 2^8 + d)
> = C + 2^8 * (a - a + c' - c) + (b' - b + d - d)
> = C + 2^8 * (c' - c) + (b' - b)
> = C - (2^8 * c + b) + (2^8 * c' + b')
>
> So an unaligned incremental checksum can be done in a single operation
> by byteswapping the changed bytes before passing them to the helper.
> This patch implements that trick for unaligned MSS option updates.
If this is a fix as the title specify, there no Fixes: tag?
To me, this qualifies as an enhancement, if anything.
Maybe more correct subject is:
"netfilter: TCPMSS: handle packets with unaligned MSS option"
since this mangling unaligned MSS has never been supported this far.
Thanks
> Signed-off-by: Kacper Kokot <kacper.kokot.44@gmail.com>
> ---
> I decided to go with the get_unaligned_be16 suggestion because
> it's idiomatic and it produces shorter assembly on x86-64
> (6 instructions vs 9). SYN processing is a cold path so
> I didn't look into it further.
>
> v2:
> - Use get_unaligned_be16 (Fernando's suggestion)
> - Fix alignment check expression (David)
> - Mention it's a theoretical bug in the commit message
> - Drop cc stable, the bug is only theoretical
>
> net/netfilter/xt_TCPMSS.c | 16 ++++++++++++++--
> 1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/net/netfilter/xt_TCPMSS.c b/net/netfilter/xt_TCPMSS.c
> index 80e1634bc51f..32c87a520361 100644
> --- a/net/netfilter/xt_TCPMSS.c
> +++ b/net/netfilter/xt_TCPMSS.c
> @ -117,8 +117,9 @@ tcpmss_mangle_packet(struct sk_buff *skb,
> for (i = sizeof(struct tcphdr); i <= tcp_hdrlen - TCPOLEN_MSS; i += optlen(opt, i)) {
> if (opt[i] == TCPOPT_MSS && opt[i+1] == TCPOLEN_MSS) {
> u_int16_t oldmss;
> + u16 csum_oldmss, csum_newmss;
>
> - oldmss = (opt[i+2] << 8) | opt[i+3];
> + oldmss = get_unaligned_be16(&opt[i+2]);
>
> /* Never increase MSS, even when setting it, as
> * doing so results in problems for hosts that rely
> @@ -130,8 +131,19 @@ tcpmss_mangle_packet(struct sk_buff *skb,
> opt[i+2] = (newmss & 0xff00) >> 8;
> opt[i+3] = newmss & 0x00ff;
>
> + csum_oldmss = htons(oldmss);
> + csum_newmss = htons(newmss);
> +
> + /* MSS may be unaligned; fix up the incremental checksum
> + * to avoid an invalid checksum and a dropped packet.
> + */
> + if (((char *)&opt[i + 2] - (char *)tcph) & 0x1) {
> + csum_oldmss = swab16(csum_oldmss);
> + csum_newmss = swab16(csum_newmss);
> + }
> +
> inet_proto_csum_replace2(&tcph->check, skb,
> - htons(oldmss), htons(newmss),
> + csum_oldmss, csum_newmss,
> false);
> return 0;
> }
> --
> 2.43.0
>
> > Padding TCP options with NOPs is optional, so it is legal to send an > > MSS option that is not aligned to a word boundary [...] > > Yes, but how many stacks do this? None that I'm aware of. Mainstream stacks pad everything to a word boundary and put MSS as the first option. The motivation is RFC 9293 (MUST-64) spec conformance rather than a bug seen in the wild. > > This has not been observed in any real environment. > > ... then why is this a fix? > [...] > To me, this qualifies as an enhancement, if anything. I'll drop the "fix" and reframe this as an enhancement including your suggested subject line. > This is questionably a "clean packet". Fair point, I shouldn't have framed it as legitimate/clean traffic being dropped. > And "the kernel is not silently dropping anything, it is policy that > would drop it" [...] I'll reword the commit message to say the mangled packet ends up with an invalid checksum and could then be dropped by policy, rather than implying the kernel itself drops it. Thanks for the review.
© 2016 - 2026 Red Hat, Inc.