drivers/infiniband/sw/siw/siw_qp_tx.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
Ever since commit c2ff29e99a76 ("siw: Inline do_tcp_sendpages()"),
we have been doing this:
static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset,
size_t size)
[...]
/* Calculate the number of bytes we need to push, for this page
* specifically */
size_t bytes = min_t(size_t, PAGE_SIZE - offset, size);
/* If we can't splice it, then copy it in, as normal */
if (!sendpage_ok(page[i]))
msg.msg_flags &= ~MSG_SPLICE_PAGES;
/* Set the bvec pointing to the page, with len $bytes */
bvec_set_page(&bvec, page[i], bytes, offset);
/* Set the iter to $size, aka the size of the whole sendpages (!!!) */
iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
try_page_again:
lock_sock(sk);
/* Sendmsg with $size size (!!!) */
rv = tcp_sendmsg_locked(sk, &msg, size);
This means we've been sending oversized iov_iters and tcp_sendmsg calls
for a while. This has a been a benign bug because sendpage_ok() always
returned true. With the recent slab allocator changes being slowly
introduced into next (that disallow sendpage on large kmalloc
allocations), we have recently hit out-of-bounds crashes, due to slight
differences in iov_iter behavior between the MSG_SPLICE_PAGES and
"regular" copy paths:
(MSG_SPLICE_PAGES)
skb_splice_from_iter
iov_iter_extract_pages
iov_iter_extract_bvec_pages
uses i->nr_segs to correctly stop in its tracks before OoB'ing everywhere
skb_splice_from_iter gets a "short" read
(!MSG_SPLICE_PAGES)
skb_copy_to_page_nocache copy=iov_iter_count
[...]
copy_from_iter
/* this doesn't help */
if (unlikely(iter->count < len))
len = iter->count;
iterate_bvec
... and we run off the bvecs
Fix this by properly setting the iov_iter's byte count, plus sending the
correct byte count to tcp_sendmsg_locked.
Cc: stable@vger.kernel.org
Fixes: c2ff29e99a76 ("siw: Inline do_tcp_sendpages()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202507220801.50a7210-lkp@intel.com
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Pedro Falcato <pfalcato@suse.de>
---
v2:
- Add David Howells's Rb on the original patch
- Remove the offset increment, since it's dead code
drivers/infiniband/sw/siw/siw_qp_tx.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c b/drivers/infiniband/sw/siw/siw_qp_tx.c
index 3a08f57d2211..f7dd32c6e5ba 100644
--- a/drivers/infiniband/sw/siw/siw_qp_tx.c
+++ b/drivers/infiniband/sw/siw/siw_qp_tx.c
@@ -340,18 +340,17 @@ static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset,
if (!sendpage_ok(page[i]))
msg.msg_flags &= ~MSG_SPLICE_PAGES;
bvec_set_page(&bvec, page[i], bytes, offset);
- iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
+ iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, bytes);
try_page_again:
lock_sock(sk);
- rv = tcp_sendmsg_locked(sk, &msg, size);
+ rv = tcp_sendmsg_locked(sk, &msg, bytes);
release_sock(sk);
if (rv > 0) {
size -= rv;
sent += rv;
if (rv != bytes) {
- offset += rv;
bytes -= rv;
goto try_page_again;
}
--
2.50.1
On Tue, Jul 29, 2025 at 01:03:48PM +0100, Pedro Falcato wrote: > Ever since commit c2ff29e99a76 ("siw: Inline do_tcp_sendpages()"), > we have been doing this: > > static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset, > size_t size) > [...] > /* Calculate the number of bytes we need to push, for this page > * specifically */ > size_t bytes = min_t(size_t, PAGE_SIZE - offset, size); > /* If we can't splice it, then copy it in, as normal */ > if (!sendpage_ok(page[i])) > msg.msg_flags &= ~MSG_SPLICE_PAGES; > /* Set the bvec pointing to the page, with len $bytes */ > bvec_set_page(&bvec, page[i], bytes, offset); > /* Set the iter to $size, aka the size of the whole sendpages (!!!) */ > iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); > try_page_again: > lock_sock(sk); > /* Sendmsg with $size size (!!!) */ > rv = tcp_sendmsg_locked(sk, &msg, size); > > This means we've been sending oversized iov_iters and tcp_sendmsg calls > for a while. This has a been a benign bug because sendpage_ok() always > returned true. With the recent slab allocator changes being slowly > introduced into next (that disallow sendpage on large kmalloc > allocations), we have recently hit out-of-bounds crashes, due to slight > differences in iov_iter behavior between the MSG_SPLICE_PAGES and > "regular" copy paths: > > (MSG_SPLICE_PAGES) > skb_splice_from_iter > iov_iter_extract_pages > iov_iter_extract_bvec_pages > uses i->nr_segs to correctly stop in its tracks before OoB'ing everywhere > skb_splice_from_iter gets a "short" read > > (!MSG_SPLICE_PAGES) > skb_copy_to_page_nocache copy=iov_iter_count > [...] > copy_from_iter > /* this doesn't help */ > if (unlikely(iter->count < len)) > len = iter->count; > iterate_bvec > ... and we run off the bvecs > > Fix this by properly setting the iov_iter's byte count, plus sending the > correct byte count to tcp_sendmsg_locked. > > Cc: stable@vger.kernel.org > Fixes: c2ff29e99a76 ("siw: Inline do_tcp_sendpages()") > Reported-by: kernel test robot <oliver.sang@intel.com> > Closes: https://lore.kernel.org/oe-lkp/202507220801.50a7210-lkp@intel.com > Reviewed-by: David Howells <dhowells@redhat.com> > Signed-off-by: Pedro Falcato <pfalcato@suse.de> Applied thanks, Jason
On 29.07.2025 14:03, Pedro Falcato wrote: > Ever since commit c2ff29e99a76 ("siw: Inline do_tcp_sendpages()"), > we have been doing this: > > static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset, > size_t size) > [...] > /* Calculate the number of bytes we need to push, for this page > * specifically */ > size_t bytes = min_t(size_t, PAGE_SIZE - offset, size); > /* If we can't splice it, then copy it in, as normal */ > if (!sendpage_ok(page[i])) > msg.msg_flags &= ~MSG_SPLICE_PAGES; > /* Set the bvec pointing to the page, with len $bytes */ > bvec_set_page(&bvec, page[i], bytes, offset); > /* Set the iter to $size, aka the size of the whole sendpages (!!!) */ > iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); > try_page_again: > lock_sock(sk); > /* Sendmsg with $size size (!!!) */ > rv = tcp_sendmsg_locked(sk, &msg, size); > > This means we've been sending oversized iov_iters and tcp_sendmsg calls > for a while. This has a been a benign bug because sendpage_ok() always > returned true. With the recent slab allocator changes being slowly > introduced into next (that disallow sendpage on large kmalloc > allocations), we have recently hit out-of-bounds crashes, due to slight > differences in iov_iter behavior between the MSG_SPLICE_PAGES and > "regular" copy paths: > > (MSG_SPLICE_PAGES) > skb_splice_from_iter > iov_iter_extract_pages > iov_iter_extract_bvec_pages > uses i->nr_segs to correctly stop in its tracks before OoB'ing everywhere > skb_splice_from_iter gets a "short" read > > (!MSG_SPLICE_PAGES) > skb_copy_to_page_nocache copy=iov_iter_count > [...] > copy_from_iter > /* this doesn't help */ > if (unlikely(iter->count < len)) > len = iter->count; > iterate_bvec > ... and we run off the bvecs > > Fix this by properly setting the iov_iter's byte count, plus sending the > correct byte count to tcp_sendmsg_locked. > > Cc: stable@vger.kernel.org > Fixes: c2ff29e99a76 ("siw: Inline do_tcp_sendpages()") > Reported-by: kernel test robot <oliver.sang@intel.com> > Closes: https://lore.kernel.org/oe-lkp/202507220801.50a7210-lkp@intel.com > Reviewed-by: David Howells <dhowells@redhat.com> > Signed-off-by: Pedro Falcato <pfalcato@suse.de> > --- > > v2: > - Add David Howells's Rb on the original patch > - Remove the offset increment, since it's dead code > > drivers/infiniband/sw/siw/siw_qp_tx.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c b/drivers/infiniband/sw/siw/siw_qp_tx.c > index 3a08f57d2211..f7dd32c6e5ba 100644 > --- a/drivers/infiniband/sw/siw/siw_qp_tx.c > +++ b/drivers/infiniband/sw/siw/siw_qp_tx.c > @@ -340,18 +340,17 @@ static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset, > if (!sendpage_ok(page[i])) > msg.msg_flags &= ~MSG_SPLICE_PAGES; > bvec_set_page(&bvec, page[i], bytes, offset); > - iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); > + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, bytes); > > try_page_again: > lock_sock(sk); > - rv = tcp_sendmsg_locked(sk, &msg, size); > + rv = tcp_sendmsg_locked(sk, &msg, bytes) > release_sock(sk); > > if (rv > 0) { > size -= rv; > sent += rv; > if (rv != bytes) { > - offset += rv; > bytes -= rv; > goto try_page_again; > } Acked-by: Bernard Metzler <bernard.metzler@linux.dev>
On Tue, Jul 29, 2025 at 08:53:02PM +0200, Bernard Metzler wrote: > On 29.07.2025 14:03, Pedro Falcato wrote: > > Ever since commit c2ff29e99a76 ("siw: Inline do_tcp_sendpages()"), > > we have been doing this: > > > > static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset, > > size_t size) > > [...] > > /* Calculate the number of bytes we need to push, for this page > > * specifically */ > > size_t bytes = min_t(size_t, PAGE_SIZE - offset, size); > > /* If we can't splice it, then copy it in, as normal */ > > if (!sendpage_ok(page[i])) > > msg.msg_flags &= ~MSG_SPLICE_PAGES; > > /* Set the bvec pointing to the page, with len $bytes */ > > bvec_set_page(&bvec, page[i], bytes, offset); > > /* Set the iter to $size, aka the size of the whole sendpages (!!!) */ > > iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); > > try_page_again: > > lock_sock(sk); > > /* Sendmsg with $size size (!!!) */ > > rv = tcp_sendmsg_locked(sk, &msg, size); > > > > This means we've been sending oversized iov_iters and tcp_sendmsg calls > > for a while. This has a been a benign bug because sendpage_ok() always > > returned true. With the recent slab allocator changes being slowly > > introduced into next (that disallow sendpage on large kmalloc > > allocations), we have recently hit out-of-bounds crashes, due to slight > > differences in iov_iter behavior between the MSG_SPLICE_PAGES and > > "regular" copy paths: > > > > (MSG_SPLICE_PAGES) > > skb_splice_from_iter > > iov_iter_extract_pages > > iov_iter_extract_bvec_pages > > uses i->nr_segs to correctly stop in its tracks before OoB'ing everywhere > > skb_splice_from_iter gets a "short" read > > > > (!MSG_SPLICE_PAGES) > > skb_copy_to_page_nocache copy=iov_iter_count > > [...] > > copy_from_iter > > /* this doesn't help */ > > if (unlikely(iter->count < len)) > > len = iter->count; > > iterate_bvec > > ... and we run off the bvecs > > > > Fix this by properly setting the iov_iter's byte count, plus sending the > > correct byte count to tcp_sendmsg_locked. > > > > Cc: stable@vger.kernel.org > > Fixes: c2ff29e99a76 ("siw: Inline do_tcp_sendpages()") > > Reported-by: kernel test robot <oliver.sang@intel.com> > > Closes: https://lore.kernel.org/oe-lkp/202507220801.50a7210-lkp@intel.com > > Reviewed-by: David Howells <dhowells@redhat.com> > > Signed-off-by: Pedro Falcato <pfalcato@suse.de> > > --- > > > > v2: > > - Add David Howells's Rb on the original patch > > - Remove the offset increment, since it's dead code > > > > drivers/infiniband/sw/siw/siw_qp_tx.c | 5 ++--- > > 1 file changed, 2 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c b/drivers/infiniband/sw/siw/siw_qp_tx.c > > index 3a08f57d2211..f7dd32c6e5ba 100644 > > --- a/drivers/infiniband/sw/siw/siw_qp_tx.c > > +++ b/drivers/infiniband/sw/siw/siw_qp_tx.c > > @@ -340,18 +340,17 @@ static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset, > > if (!sendpage_ok(page[i])) > > msg.msg_flags &= ~MSG_SPLICE_PAGES; > > bvec_set_page(&bvec, page[i], bytes, offset); > > - iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); > > + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, bytes); > > try_page_again: > > lock_sock(sk); > > - rv = tcp_sendmsg_locked(sk, &msg, size); > > + rv = tcp_sendmsg_locked(sk, &msg, bytes) > > release_sock(sk); > > if (rv > 0) { > > size -= rv; > > sent += rv; > > if (rv != bytes) { > > - offset += rv; > > bytes -= rv; > > goto try_page_again; > > } > > Acked-by: Bernard Metzler <bernard.metzler@linux.dev> Thanks! Do you want to take the fix through your tree? Otherwise I suspect Vlastimil could simply take it (and possibly resubmit the SLAB PR, which hasn't been merged yet). -- Pedro
On 30.07.2025 11:26, Pedro Falcato wrote: > On Tue, Jul 29, 2025 at 08:53:02PM +0200, Bernard Metzler wrote: >> On 29.07.2025 14:03, Pedro Falcato wrote: >>> Ever since commit c2ff29e99a76 ("siw: Inline do_tcp_sendpages()"), >>> we have been doing this: >>> >>> static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset, >>> size_t size) >>> [...] >>> /* Calculate the number of bytes we need to push, for this page >>> * specifically */ >>> size_t bytes = min_t(size_t, PAGE_SIZE - offset, size); >>> /* If we can't splice it, then copy it in, as normal */ >>> if (!sendpage_ok(page[i])) >>> msg.msg_flags &= ~MSG_SPLICE_PAGES; >>> /* Set the bvec pointing to the page, with len $bytes */ >>> bvec_set_page(&bvec, page[i], bytes, offset); >>> /* Set the iter to $size, aka the size of the whole sendpages (!!!) */ >>> iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); >>> try_page_again: >>> lock_sock(sk); >>> /* Sendmsg with $size size (!!!) */ >>> rv = tcp_sendmsg_locked(sk, &msg, size); >>> >>> This means we've been sending oversized iov_iters and tcp_sendmsg calls >>> for a while. This has a been a benign bug because sendpage_ok() always >>> returned true. With the recent slab allocator changes being slowly >>> introduced into next (that disallow sendpage on large kmalloc >>> allocations), we have recently hit out-of-bounds crashes, due to slight >>> differences in iov_iter behavior between the MSG_SPLICE_PAGES and >>> "regular" copy paths: >>> >>> (MSG_SPLICE_PAGES) >>> skb_splice_from_iter >>> iov_iter_extract_pages >>> iov_iter_extract_bvec_pages >>> uses i->nr_segs to correctly stop in its tracks before OoB'ing everywhere >>> skb_splice_from_iter gets a "short" read >>> >>> (!MSG_SPLICE_PAGES) >>> skb_copy_to_page_nocache copy=iov_iter_count >>> [...] >>> copy_from_iter >>> /* this doesn't help */ >>> if (unlikely(iter->count < len)) >>> len = iter->count; >>> iterate_bvec >>> ... and we run off the bvecs >>> >>> Fix this by properly setting the iov_iter's byte count, plus sending the >>> correct byte count to tcp_sendmsg_locked. >>> >>> Cc: stable@vger.kernel.org >>> Fixes: c2ff29e99a76 ("siw: Inline do_tcp_sendpages()") >>> Reported-by: kernel test robot <oliver.sang@intel.com> >>> Closes: https://lore.kernel.org/oe-lkp/202507220801.50a7210-lkp@intel.com >>> Reviewed-by: David Howells <dhowells@redhat.com> >>> Signed-off-by: Pedro Falcato <pfalcato@suse.de> >>> --- >>> >>> v2: >>> - Add David Howells's Rb on the original patch >>> - Remove the offset increment, since it's dead code >>> >>> drivers/infiniband/sw/siw/siw_qp_tx.c | 5 ++--- >>> 1 file changed, 2 insertions(+), 3 deletions(-) >>> >>> diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c b/drivers/infiniband/sw/siw/siw_qp_tx.c >>> index 3a08f57d2211..f7dd32c6e5ba 100644 >>> --- a/drivers/infiniband/sw/siw/siw_qp_tx.c >>> +++ b/drivers/infiniband/sw/siw/siw_qp_tx.c >>> @@ -340,18 +340,17 @@ static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset, >>> if (!sendpage_ok(page[i])) >>> msg.msg_flags &= ~MSG_SPLICE_PAGES; >>> bvec_set_page(&bvec, page[i], bytes, offset); >>> - iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); >>> + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, bytes); >>> try_page_again: >>> lock_sock(sk); >>> - rv = tcp_sendmsg_locked(sk, &msg, size); >>> + rv = tcp_sendmsg_locked(sk, &msg, bytes) >>> release_sock(sk); >>> if (rv > 0) { >>> size -= rv; >>> sent += rv; >>> if (rv != bytes) { >>> - offset += rv; >>> bytes -= rv; >>> goto try_page_again; >>> } >> Acked-by: Bernard Metzler <bernard.metzler@linux.dev> > > Thanks! > > Do you want to take the fix through your tree? Otherwise I suspect Vlastimil > could simply take it (and possibly resubmit the SLAB PR, which hasn't been > merged yet). > Thanks Pedro. Having Vlastimil taking care sounds good to me. I am currently without development infrastructure (small village in the mountains thing). And fixing the SLAB PR in the first place would be even better. Best, Bernard.
On 7/31/25 22:19, Bernard Metzler wrote: > On 30.07.2025 11:26, Pedro Falcato wrote: >> On Tue, Jul 29, 2025 at 08:53:02PM +0200, Bernard Metzler wrote: >>> On 29.07.2025 14:03, Pedro Falcato wrote: >> Thanks! >> >> Do you want to take the fix through your tree? Otherwise I suspect Vlastimil >> could simply take it (and possibly resubmit the SLAB PR, which hasn't been >> merged yet). >> > Thanks Pedro. Having Vlastimil taking care sounds good to me. > > I am currently without development infrastructure (small village > > in the mountains thing). And fixing the SLAB PR in the > > first place would be even better. The SLAB PR was meanwhile merged and Jason said he would take care of sending a PR with the fix: https://lore.kernel.org/all/20250730184724.GC89283@nvidia.com/ > Best, > > Bernard. >
© 2016 - 2025 Red Hat, Inc.