[edk2-devel] [PATCH 0/3] MdePkg/BaseLib: Base64Decode: Make it follow its specification

Gao, Zhichao posted 3 patches 4 years, 10 months ago
Failed in applying to current master (apply log)
MdePkg/Library/BaseLib/String.c | 28 ++++++++++++++++++----------
1 file changed, 18 insertions(+), 10 deletions(-)
[edk2-devel] [PATCH 0/3] MdePkg/BaseLib: Base64Decode: Make it follow its specification
Posted by Gao, Zhichao 4 years, 10 months ago
REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1891

Adjust the coding style.
Set DestinationSize before return.
Add addition decription for the RETURN_SUCCESS case.

Cc: Michael D Kinney <michael.d.kinney@intel.com>
Cc: Liming Gao <liming.gao@intel.com>
Cc: Marvin Hauser <mhaeuser@outlook.de>
Cc: Laszlo Ersek <lersek@redhat.com>

Zhichao Gao (3):
  MdePkg/BaseLib: Adjust the coding style in Base64Decode
  MdePkg/BaseLib: Base64Decode: Make DestinationSize complied to spec
  MdePkg/BaseLib: Base64Decode: Add decription for RETURN_SUCCESS

 MdePkg/Library/BaseLib/String.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

-- 
2.21.0.windows.1


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.

View/Reply Online (#42968): https://edk2.groups.io/g/devel/message/42968
Mute This Topic: https://groups.io/mt/32238987/1787277
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub  [importer@patchew.org]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [edk2-devel] [PATCH 0/3] MdePkg/BaseLib: Base64Decode: Make it follow its specification
Posted by Laszlo Ersek 4 years, 10 months ago
On 06/28/19 05:57, Gao, Zhichao wrote:
> REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1891
> 
> Adjust the coding style.
> Set DestinationSize before return.
> Add addition decription for the RETURN_SUCCESS case.
> 
> Cc: Michael D Kinney <michael.d.kinney@intel.com>
> Cc: Liming Gao <liming.gao@intel.com>
> Cc: Marvin Hauser <mhaeuser@outlook.de>
> Cc: Laszlo Ersek <lersek@redhat.com>
> 
> Zhichao Gao (3):
>   MdePkg/BaseLib: Adjust the coding style in Base64Decode
>   MdePkg/BaseLib: Base64Decode: Make DestinationSize complied to spec
>   MdePkg/BaseLib: Base64Decode: Add decription for RETURN_SUCCESS
> 
>  MdePkg/Library/BaseLib/String.c | 28 ++++++++++++++++++----------
>  1 file changed, 18 insertions(+), 10 deletions(-)
> 

I'll try to review patches #2 and #3 in this series later.

If you get sufficient MdePkg owner R-b's meanwhile, don't wait for me
with pushing the patches.

Thanks
Laszlo

-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.

View/Reply Online (#43024): https://edk2.groups.io/g/devel/message/43024
Mute This Topic: https://groups.io/mt/32238987/1787277
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub  [importer@patchew.org]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [edk2-devel] [PATCH 0/3] MdePkg/BaseLib: Base64Decode: Make it follow its specification
Posted by Laszlo Ersek 4 years, 10 months ago
On 06/28/19 05:57, Gao, Zhichao wrote:
> REF: https://bugzilla.tianocore.org/show_bug.cgi?id=1891
> 
> Adjust the coding style.
> Set DestinationSize before return.
> Add addition decription for the RETURN_SUCCESS case.
> 
> Cc: Michael D Kinney <michael.d.kinney@intel.com>
> Cc: Liming Gao <liming.gao@intel.com>
> Cc: Marvin Hauser <mhaeuser@outlook.de>
> Cc: Laszlo Ersek <lersek@redhat.com>
> 
> Zhichao Gao (3):
>   MdePkg/BaseLib: Adjust the coding style in Base64Decode
>   MdePkg/BaseLib: Base64Decode: Make DestinationSize complied to spec
>   MdePkg/BaseLib: Base64Decode: Add decription for RETURN_SUCCESS
> 
>  MdePkg/Library/BaseLib/String.c | 28 ++++++++++++++++++----------
>  1 file changed, 18 insertions(+), 10 deletions(-)
> 

Issues that have not been addressed by this patch set, but should be:

(1) the leading comment says, "Produce Null-terminated binary data in
the output buffer". That's bogus, the binary output is never
NUL-terminated (nor should it be).


(2) One of the RETURN_INVALID_PARAMETER cases is documented as:

"If SourceLength or DestinationSize is bigger than (MAX_ADDRESS
-(UINTN)Destination )."

There are two problems with this.


(2a) SourceLength has nothing to do with Destination. The comment should
be updated -- making sure that (Source + SourceLength) do not overflow
MAX_ADDRESS is worthwhile, but the comment is misleading.


(2b) The code that actually performs the check is off by one. What we
need is that the byte *one past* each buffer still be expressible as a
valid address, so mathematically we need

  (UINTN)Buffer + BufferLength <= MAX_ADDRESS

If you reorder this for a C expression that cannot overflow, you get

  BufferLength <= MAX_ADDRESS - (UINTN)Buffer

If you negate this, to express the failure condition, you get

  BufferLength > MAX_ADDRESS - (UINTN)Buffer

Therefore, the comment for RETURN_INVALID_PARAMETER that says

  DestinationSize is bigger than (MAX_ADDRESS -(UINTN)Destination)

is correct; *however*, the code is wrong:

  *DestinationSize >= (MAX_ADDRESS - (UINTN)Destination)

This failure condition is too lax (IOW, the success condition is too
strict), and shuld be restricted (IOW the success condition should be
relaxed).


(3) Maintaining a signed integer (INTN) BufferSize, and then doing
arithmetic on it with UINTN values, is really bad practice. In
particular, the following expression makes me nervous:

  BufferSize += ActualSourceLength / 4 * 3;

A separate UINTN variable called "EqualSigns" should be introduced,
BufferSize should be made an UINTN, and the logic should be reworked
using those.


(4) "DecodingTable" should be called "mDecodingTable".


(5) The decoding loop checks

  (SourceIndex < SourceLength) && (DestinationIndex < *DestinationSize)

and we have an inner loop

      do {
        Chr = DecodingTable[(UINT8) Source[SourceIndex++]];
      } while (Chr == BAD_V);

Now consider the following case. The caller passes in a valid
(*DestinationSize) that is larger than what is requried for the
decoding. In addition, assume that the input data (Source), which is
otherwise completely valid, is terminated with a space character (or
even with a NUL character, although NUL-termination is not required by
the function's specification).

In the above situation, the innermost loop, which scans for BAD_V, will
fall off the end of Source. The outermost loop condition will evaluate
to TRUE (we have some Source characters left -- namely, one space, or
NUL), and we have room in the Destination buffer too (the caller
specified / allocated a larger DestinationSize thatn what is required).
So we reach the innermost loop, and the space character (BAD_V) will
lead it right off the end of Source.

The outermost loop condition should be changed.

First, the SourceIndex subcondition should be dropped altogether.

Second, *DestinationSize should be set to the actual decoded data size
*before* starting the actual decoding. Then, if we still have output
bytes to produce, in the outermost loop, the Source scanning in the
innermost BAD_V loop is guaranteed to remain in-bounds.


... Honestly, at this point, I sort of wish we just rewrote this
function from zero. The current *approach* of the function is wrong. The
function currently forms a mental image of how the input data "should"
look, and tries to parse that -- it tries to shoehorn the input into the
"expected" format. If the input does not look like the expectation, we
run into gaps here and there.

Instead, the function should follow a state machine approach, where the
outermost loop scans input characters one by one, and makes *absolutely
no assumption* about the character that has just been found. Every UINT8
character in the input should be checked against the full possible UINT8
domain (valid BASE64 range, the equal sign, tolerated whitespace, and
the rest), and acted upon accordingly.

For example, valid BASE64 characters should be accumulated into a 24-bit
value, and flushed when the latter becomes full, and also at the end of
the scanning loop.

Counting vs. decoding can be implemented by making just the flushing
operation conditional (do not write to memory).

Thanks
Laszlo

-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.

View/Reply Online (#43116): https://edk2.groups.io/g/devel/message/43116
Mute This Topic: https://groups.io/mt/32238987/1787277
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub  [importer@patchew.org]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [edk2-devel] [PATCH 0/3] MdePkg/BaseLib: Base64Decode: Make it follow its specification
Posted by Laszlo Ersek 4 years, 10 months ago
On 07/01/19 13:02, Laszlo Ersek wrote:
> On 06/28/19 05:57, Gao, Zhichao wrote:

> (2a) SourceLength has nothing to do with Destination. The comment should
> be updated -- making sure that (Source + SourceLength) do not overflow
> MAX_ADDRESS is worthwhile, but the comment is misleading.

Let me re-state that.

Usually, when you expect the caller to provide an array of bytes,
identified by base address and size, the burden to provide an *actual
array* is on the caller. If the caller does not conform to the function
specification, the behavior is undefined, and it's on the caller.
Therefore, checking MAX_ADDRESS overflows is *generally* dubious, in my
opinion, because a valid array can never overflow MAX_ADDRESS.

If we want to be paranoid about this, I guess we can keep implement
MAX_ADDRESS checks, but then we should both document and implement them
correctly.

Second, it is usually good to specify whether overlap between input and
output is permitted. If we want to be paranoid, we can check that
explicitly again. I don't necessarily suggest that we implement an
overlap check, but we should likely specify in the leading comment that
overlap is not permitted. (This is similar to the "restrict" keyword
from C99.)

Thanks
Laszlo

-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.

View/Reply Online (#43118): https://edk2.groups.io/g/devel/message/43118
Mute This Topic: https://groups.io/mt/32238987/1787277
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub  [importer@patchew.org]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [edk2-devel] [PATCH 0/3] MdePkg/BaseLib: Base64Decode: Make it follow its specification
Posted by Laszlo Ersek 4 years, 10 months ago
On 07/01/19 13:02, Laszlo Ersek wrote:

> ... Honestly, at this point, I sort of wish we just rewrote this
> function from zero. The current *approach* of the function is wrong. The
> function currently forms a mental image of how the input data "should"
> look, and tries to parse that -- it tries to shoehorn the input into the
> "expected" format. If the input does not look like the expectation, we
> run into gaps here and there.
> 
> Instead, the function should follow a state machine approach, where the
> outermost loop scans input characters one by one, and makes *absolutely
> no assumption* about the character that has just been found. Every UINT8
> character in the input should be checked against the full possible UINT8
> domain (valid BASE64 range, the equal sign, tolerated whitespace, and
> the rest), and acted upon accordingly.
> 
> For example, valid BASE64 characters should be accumulated into a 24-bit
> value, and flushed when the latter becomes full, and also at the end of
> the scanning loop.
> 
> Counting vs. decoding can be implemented by making just the flushing
> operation conditional (do not write to memory).

If time allows, I'd like to attempt contributing a version like this.
Please give me a bit of time to work on that.

Thanks
Laszlo

-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.

View/Reply Online (#43131): https://edk2.groups.io/g/devel/message/43131
Mute This Topic: https://groups.io/mt/32238987/1787277
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub  [importer@patchew.org]
-=-=-=-=-=-=-=-=-=-=-=-