[PATCH v2 1/4] fs/nls: Fix utf16 to utf8 conversion

Armin Wolf posted 4 patches 1 month, 1 week ago
[PATCH v2 1/4] fs/nls: Fix utf16 to utf8 conversion
Posted by Armin Wolf 1 month, 1 week ago
Currently the function responsible for converting between utf16 and
utf8 strings will ignore any characters that cannot be converted. This
however also includes multi-byte characters that do not fit into the
provided string buffer.

This can cause problems if such a multi-byte character is followed by
a single-byte character. In such a case the multi-byte character might
be ignored when the provided string buffer is too small, but the
single-byte character might fit and is thus still copied into the
resulting string.

Fix this by stop filling the provided string buffer once a character
does not fit. In order to be able to do this extend utf32_to_utf8()
to return useful errno codes instead of -1.

Fixes: 74675a58507e ("NLS: update handling of Unicode")
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
---
 fs/nls/nls_base.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/nls/nls_base.c b/fs/nls/nls_base.c
index 18d597e49a19..d434c4463a8f 100644
--- a/fs/nls/nls_base.c
+++ b/fs/nls/nls_base.c
@@ -94,7 +94,7 @@ int utf32_to_utf8(unicode_t u, u8 *s, int maxout)
 
 	l = u;
 	if (l > UNICODE_MAX || (l & SURROGATE_MASK) == SURROGATE_PAIR)
-		return -1;
+		return -EILSEQ;
 
 	nc = 0;
 	for (t = utf8_table; t->cmask && maxout; t++, maxout--) {
@@ -110,7 +110,7 @@ int utf32_to_utf8(unicode_t u, u8 *s, int maxout)
 			return nc;
 		}
 	}
-	return -1;
+	return -EOVERFLOW;
 }
 EXPORT_SYMBOL(utf32_to_utf8);
 
@@ -217,8 +217,16 @@ int utf16s_to_utf8s(const wchar_t *pwcs, int inlen, enum utf16_endian endian,
 				inlen--;
 			}
 			size = utf32_to_utf8(u, op, maxout);
-			if (size == -1) {
-				/* Ignore character and move on */
+			if (size < 0) {
+				if (size == -EILSEQ) {
+					/* Ignore character and move on */
+					continue;
+				}
+				/*
+				 * Stop filling the buffer with data once a character
+				 * does not fit anymore.
+				 */
+				break;
 			} else {
 				op += size;
 				maxout -= size;
-- 
2.39.5
Re: [PATCH v2 1/4] fs/nls: Fix utf16 to utf8 conversion
Posted by Andy Shevchenko 3 weeks, 1 day ago
On Tue, Nov 11, 2025 at 02:11:22PM +0100, Armin Wolf wrote:
> Currently the function responsible for converting between utf16 and
> utf8 strings will ignore any characters that cannot be converted. This
> however also includes multi-byte characters that do not fit into the
> provided string buffer.
> 
> This can cause problems if such a multi-byte character is followed by
> a single-byte character. In such a case the multi-byte character might
> be ignored when the provided string buffer is too small, but the
> single-byte character might fit and is thus still copied into the
> resulting string.
> 
> Fix this by stop filling the provided string buffer once a character
> does not fit. In order to be able to do this extend utf32_to_utf8()
> to return useful errno codes instead of -1.

Can you also update utf8_to_utf32() to return meaningful error codes?
Without that done we have inconsistent APIs.

-- 
With Best Regards,
Andy Shevchenko
Re: [PATCH v2 1/4] fs/nls: Fix utf16 to utf8 conversion
Posted by Armin Wolf 3 weeks, 1 day ago
Am 26.11.25 um 16:18 schrieb Andy Shevchenko:

> On Tue, Nov 11, 2025 at 02:11:22PM +0100, Armin Wolf wrote:
>> Currently the function responsible for converting between utf16 and
>> utf8 strings will ignore any characters that cannot be converted. This
>> however also includes multi-byte characters that do not fit into the
>> provided string buffer.
>>
>> This can cause problems if such a multi-byte character is followed by
>> a single-byte character. In such a case the multi-byte character might
>> be ignored when the provided string buffer is too small, but the
>> single-byte character might fit and is thus still copied into the
>> resulting string.
>>
>> Fix this by stop filling the provided string buffer once a character
>> does not fit. In order to be able to do this extend utf32_to_utf8()
>> to return useful errno codes instead of -1.
> Can you also update utf8_to_utf32() to return meaningful error codes?
> Without that done we have inconsistent APIs.
>
Sure, i will send a separate patch for that.

Thanks,
Armin Wolf