Currently the function responsible for converting between utf16 and
utf8 strings will ignore any characters that cannot be converted. This
however also includes multi-byte characters that do not fit into the
provided string buffer.
This can cause problems if such a multi-byte character is followed by
a single-byte character. In such a case the multi-byte character might
be ignored when the provided string buffer is too small, but the
single-byte character might fit and is thus still copied into the
resulting string.
Fix this by stop filling the provided string buffer once a character
does not fit. In order to be able to do this extend utf32_to_utf8()
to return useful errno codes instead of -1.
Fixes: 74675a58507e ("NLS: update handling of Unicode")
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
---
fs/nls/nls_base.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/fs/nls/nls_base.c b/fs/nls/nls_base.c
index 18d597e49a19..d434c4463a8f 100644
--- a/fs/nls/nls_base.c
+++ b/fs/nls/nls_base.c
@@ -94,7 +94,7 @@ int utf32_to_utf8(unicode_t u, u8 *s, int maxout)
l = u;
if (l > UNICODE_MAX || (l & SURROGATE_MASK) == SURROGATE_PAIR)
- return -1;
+ return -EILSEQ;
nc = 0;
for (t = utf8_table; t->cmask && maxout; t++, maxout--) {
@@ -110,7 +110,7 @@ int utf32_to_utf8(unicode_t u, u8 *s, int maxout)
return nc;
}
}
- return -1;
+ return -EOVERFLOW;
}
EXPORT_SYMBOL(utf32_to_utf8);
@@ -217,8 +217,16 @@ int utf16s_to_utf8s(const wchar_t *pwcs, int inlen, enum utf16_endian endian,
inlen--;
}
size = utf32_to_utf8(u, op, maxout);
- if (size == -1) {
- /* Ignore character and move on */
+ if (size < 0) {
+ if (size == -EILSEQ) {
+ /* Ignore character and move on */
+ continue;
+ }
+ /*
+ * Stop filling the buffer with data once a character
+ * does not fit anymore.
+ */
+ break;
} else {
op += size;
maxout -= size;
--
2.39.5
On Tue, Nov 11, 2025 at 02:11:22PM +0100, Armin Wolf wrote: > Currently the function responsible for converting between utf16 and > utf8 strings will ignore any characters that cannot be converted. This > however also includes multi-byte characters that do not fit into the > provided string buffer. > > This can cause problems if such a multi-byte character is followed by > a single-byte character. In such a case the multi-byte character might > be ignored when the provided string buffer is too small, but the > single-byte character might fit and is thus still copied into the > resulting string. > > Fix this by stop filling the provided string buffer once a character > does not fit. In order to be able to do this extend utf32_to_utf8() > to return useful errno codes instead of -1. Can you also update utf8_to_utf32() to return meaningful error codes? Without that done we have inconsistent APIs. -- With Best Regards, Andy Shevchenko
Am 26.11.25 um 16:18 schrieb Andy Shevchenko: > On Tue, Nov 11, 2025 at 02:11:22PM +0100, Armin Wolf wrote: >> Currently the function responsible for converting between utf16 and >> utf8 strings will ignore any characters that cannot be converted. This >> however also includes multi-byte characters that do not fit into the >> provided string buffer. >> >> This can cause problems if such a multi-byte character is followed by >> a single-byte character. In such a case the multi-byte character might >> be ignored when the provided string buffer is too small, but the >> single-byte character might fit and is thus still copied into the >> resulting string. >> >> Fix this by stop filling the provided string buffer once a character >> does not fit. In order to be able to do this extend utf32_to_utf8() >> to return useful errno codes instead of -1. > Can you also update utf8_to_utf32() to return meaningful error codes? > Without that done we have inconsistent APIs. > Sure, i will send a separate patch for that. Thanks, Armin Wolf
© 2016 - 2025 Red Hat, Inc.