[PATCH v6 3/4] qcow2: add zstd cluster compression

Denis Plotnikov posted 4 patches 5 years, 8 months ago
There is a newer version of this series
[PATCH v6 3/4] qcow2: add zstd cluster compression
Posted by Denis Plotnikov 5 years, 8 months ago
zstd significantly reduces cluster compression time.
It provides better compression performance maintaining
the same level of the compression ratio in comparison with
zlib, which, at the moment, is the only compression
method available.

The performance test results:
Test compresses and decompresses qemu qcow2 image with just
installed rhel-7.6 guest.
Image cluster size: 64K. Image on disk size: 2.2G

The test was conducted with brd disk to reduce the influence
of disk subsystem to the test results.
The results is given in seconds.

compress cmd:
  time ./qemu-img convert -O qcow2 -c -o compression_type=[zlib|zstd]
                  src.img [zlib|zstd]_compressed.img
decompress cmd
  time ./qemu-img convert -O qcow2
                  [zlib|zstd]_compressed.img uncompressed.img

           compression               decompression
         zlib       zstd           zlib         zstd
------------------------------------------------------------
real     65.5       16.3 (-75 %)    1.9          1.6 (-16 %)
user     65.0       15.8            5.3          2.5
sys       3.3        0.2            2.0          2.0

Both ZLIB and ZSTD gave the same compression ratio: 1.57
compressed image size in both cases: 1.4G

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
QAPI part:
Acked-by: Markus Armbruster <armbru@redhat.com>
---
 docs/interop/qcow2.txt |  20 +++++++
 configure              |   2 +-
 qapi/block-core.json   |   3 +-
 block/qcow2-threads.c  | 124 +++++++++++++++++++++++++++++++++++++++++
 block/qcow2.c          |  11 ++++
 5 files changed, 158 insertions(+), 2 deletions(-)

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index 5597e24474..9048114445 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -208,6 +208,7 @@ version 2.
 
                     Available compression type values:
                         0: zlib <https://www.zlib.net/>
+                        1: zstd <http://github.com/facebook/zstd>
 
 
 === Header padding ===
@@ -575,11 +576,30 @@ Compressed Clusters Descriptor (x = 62 - (cluster_bits - 8)):
                     Another compressed cluster may map to the tail of the final
                     sector used by this compressed cluster.
 
+                    The layout of the compressed data depends on the compression
+                    type used for the image (see compressed cluster layout).
+
 If a cluster is unallocated, read requests shall read the data from the backing
 file (except if bit 0 in the Standard Cluster Descriptor is set). If there is
 no backing file or the backing file is smaller than the image, they shall read
 zeros for all parts that are not covered by the backing file.
 
+=== Compressed Cluster Layout ===
+
+The compressed cluster data has a layout depending on the compression
+type used for the image, as follows:
+
+Compressed data layout for the available compression types:
+data_space_lenght - data chunk length available to store a compressed cluster.
+(for more details see "Compressed Clusters Descriptor")
+x = data_space_length - 1
+
+    0:  (default)  zlib <http://zlib.net/>:
+            Byte  0 -  x:     the compressed data content
+                              all the space provided used for compressed data
+    1:  zstd <http://github.com/facebook/zstd>:
+            Byte  0 -  3:     the length of compressed data in bytes
+                  4 -  x:     the compressed data content
 
 == Snapshots ==
 
diff --git a/configure b/configure
index caa65f5883..b2a0aa241a 100755
--- a/configure
+++ b/configure
@@ -1835,7 +1835,7 @@ disabled with --disable-FEATURE, default is enabled if available:
   lzfse           support of lzfse compression library
                   (for reading lzfse-compressed dmg images)
   zstd            support for zstd compression library
-                  (for migration compression)
+                  (for migration compression and qcow2 cluster compression)
   seccomp         seccomp support
   coroutine-pool  coroutine freelist (better performance)
   glusterfs       GlusterFS backend
diff --git a/qapi/block-core.json b/qapi/block-core.json
index a306484973..8953451818 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4401,11 +4401,12 @@
 # Compression type used in qcow2 image file
 #
 # @zlib: zlib compression, see <http://zlib.net/>
+# @zstd: zstd compression, see <http://github.com/facebook/zstd>
 #
 # Since: 5.0
 ##
 { 'enum': 'Qcow2CompressionType',
-  'data': [ 'zlib' ] }
+  'data': [ 'zlib', { 'name': 'zstd', 'if': 'defined(CONFIG_ZSTD)' } ] }
 
 ##
 # @BlockdevCreateOptionsQcow2:
diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c
index 7dbaf53489..b2d1c6d395 100644
--- a/block/qcow2-threads.c
+++ b/block/qcow2-threads.c
@@ -28,6 +28,11 @@
 #define ZLIB_CONST
 #include <zlib.h>
 
+#ifdef CONFIG_ZSTD
+#include <zstd.h>
+#include <zstd_errors.h>
+#endif
+
 #include "qcow2.h"
 #include "block/thread-pool.h"
 #include "crypto.h"
@@ -166,6 +171,115 @@ static ssize_t qcow2_zlib_decompress(void *dest, size_t dest_size,
     return ret;
 }
 
+#ifdef CONFIG_ZSTD
+
+/* The buffer size to store compressed chunk length */
+#define ZSTD_LEN_BUF 4
+
+/*
+ * qcow2_zstd_compress()
+ *
+ * Compress @src_size bytes of data using zstd compression method
+ *
+ * @dest - destination buffer, @dest_size bytes
+ * @src - source buffer, @src_size bytes
+ *
+ * Returns: compressed size on success
+ *          -ENOMEM destination buffer is not enough to store compressed data
+ *          -EIO    on any other error
+ */
+static ssize_t qcow2_zstd_compress(void *dest, size_t dest_size,
+                                   const void *src, size_t src_size)
+{
+    size_t ret;
+
+    /*
+     * steal ZSTD_LEN_BUF bytes in the very beginning of the buffer
+     * to store compressed chunk size
+     */
+    char *d_buf = ((char *) dest) + ZSTD_LEN_BUF;
+
+    /*
+     * sanity check that we can store the compressed data length,
+     * and there is some space left for the compressor buffer
+     */
+    if (dest_size <= ZSTD_LEN_BUF) {
+        return -ENOMEM;
+    }
+
+    dest_size -= ZSTD_LEN_BUF;
+
+    ret = ZSTD_compress(d_buf, dest_size, src, src_size, 5);
+
+    if (ZSTD_isError(ret)) {
+        if (ZSTD_getErrorCode(ret) == ZSTD_error_dstSize_tooSmall) {
+            return -ENOMEM;
+        } else {
+            return -EIO;
+        }
+    }
+
+    /*
+     * paranoid sanity check that we can store
+     * the compressed size in the first 4 bytes
+     */
+    if (ret > UINT32_MAX) {
+        return -ENOMEM;
+    }
+
+    /* store the compressed chunk size in the very beginning of the buffer */
+    stl_be_p(dest, ret);
+
+    return ret + ZSTD_LEN_BUF;
+}
+
+/*
+ * qcow2_zstd_decompress()
+ *
+ * Decompress some data (not more than @src_size bytes) to produce exactly
+ * @dest_size bytes using zstd compression method
+ *
+ * @dest - destination buffer, @dest_size bytes
+ * @src - source buffer, @src_size bytes
+ *
+ * Returns: 0 on success
+ *          -EIO on any error
+ */
+static ssize_t qcow2_zstd_decompress(void *dest, size_t dest_size,
+                                     const void *src, size_t src_size)
+{
+    /*
+     * zstd decompress wants to know the exact length of the data.
+     * For that purpose, on compression, the length is stored in
+     * the very beginning of the compressed buffer
+     */
+    size_t s_size;
+    const char *s_buf = ((const char *) src) + ZSTD_LEN_BUF;
+
+    /*
+     * sanity check that we can read 4 byte the content length and
+     * and there is some content to decompress
+     */
+    if (src_size <= ZSTD_LEN_BUF) {
+        return -EIO;
+    }
+
+    s_size = ldl_be_p(src);
+
+    /* sanity check that the buffer is big enough to read the content from */
+    if (src_size - ZSTD_LEN_BUF < s_size) {
+        return -EIO;
+    }
+
+    if (ZSTD_isError(
+            ZSTD_decompress(dest, dest_size, s_buf, s_size))) {
+        return -EIO;
+    }
+
+    return 0;
+}
+#endif
+
 static int qcow2_compress_pool_func(void *opaque)
 {
     Qcow2CompressData *data = opaque;
@@ -217,6 +331,11 @@ qcow2_co_compress(BlockDriverState *bs, void *dest, size_t dest_size,
         fn = qcow2_zlib_compress;
         break;
 
+#ifdef CONFIG_ZSTD
+    case QCOW2_COMPRESSION_TYPE_ZSTD:
+        fn = qcow2_zstd_compress;
+        break;
+#endif
     default:
         abort();
     }
@@ -249,6 +368,11 @@ qcow2_co_decompress(BlockDriverState *bs, void *dest, size_t dest_size,
         fn = qcow2_zlib_decompress;
         break;
 
+#ifdef CONFIG_ZSTD
+    case QCOW2_COMPRESSION_TYPE_ZSTD:
+        fn = qcow2_zstd_decompress;
+        break;
+#endif
     default:
         abort();
     }
diff --git a/block/qcow2.c b/block/qcow2.c
index 21231adb63..53f65502f1 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1246,6 +1246,9 @@ static int validate_compression_type(BDRVQcow2State *s, Error **errp)
 {
     switch (s->compression_type) {
     case QCOW2_COMPRESSION_TYPE_ZLIB:
+#ifdef CONFIG_ZSTD
+    case QCOW2_COMPRESSION_TYPE_ZSTD:
+#endif
         break;
 
     default:
@@ -1279,6 +1282,10 @@ static int qcow2_compression_type_from_format(const char *ct)
 {
     if (g_str_equal(ct, "zlib")) {
         return QCOW2_COMPRESSION_TYPE_ZLIB;
+#ifdef CONFIG_ZSTD
+    } else if (g_str_equal(ct, "zstd")) {
+        return QCOW2_COMPRESSION_TYPE_ZSTD;
+#endif
     } else {
         return -EINVAL;
     }
@@ -3463,6 +3470,10 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
         }
 
         switch (qcow2_opts->compression_type) {
+#ifdef CONFIG_ZSTD
+        case QCOW2_COMPRESSION_TYPE_ZSTD:
+            break;
+#endif
         default:
             error_setg(errp, "Unknown compression type");
             goto out;
-- 
2.17.0


Re: [PATCH v6 3/4] qcow2: add zstd cluster compression
Posted by Eric Blake 5 years, 8 months ago
On 3/12/20 4:22 AM, Denis Plotnikov wrote:
> zstd significantly reduces cluster compression time.
> It provides better compression performance maintaining
> the same level of the compression ratio in comparison with
> zlib, which, at the moment, is the only compression
> method available.
> 

> +++ b/docs/interop/qcow2.txt
> @@ -208,6 +208,7 @@ version 2.
>   
>                       Available compression type values:
>                           0: zlib <https://www.zlib.net/>
> +                        1: zstd <http://github.com/facebook/zstd>
>   
>   
>   === Header padding ===
> @@ -575,11 +576,30 @@ Compressed Clusters Descriptor (x = 62 - (cluster_bits - 8)):
>                       Another compressed cluster may map to the tail of the final
>                       sector used by this compressed cluster.
>   
> +                    The layout of the compressed data depends on the compression
> +                    type used for the image (see compressed cluster layout).
> +
>   If a cluster is unallocated, read requests shall read the data from the backing
>   file (except if bit 0 in the Standard Cluster Descriptor is set). If there is
>   no backing file or the backing file is smaller than the image, they shall read
>   zeros for all parts that are not covered by the backing file.
>   
> +=== Compressed Cluster Layout ===
> +
> +The compressed cluster data has a layout depending on the compression
> +type used for the image, as follows:
> +
> +Compressed data layout for the available compression types:
> +data_space_lenght - data chunk length available to store a compressed cluster.

length

> +(for more details see "Compressed Clusters Descriptor")
> +x = data_space_length - 1

If I understand correctly, data_space_length is really an upper bounds 
on the length available, because it is computed by rounding UP to the 
next 512-byte boundary (that is, the L2 descriptor lists the number of 
additional sectors used in storing the compressed data).  Which really 
means that we have the following, where + is cluster boundaries, S and E 
are the start and end of the compressed data, and D is the offset 
determined by data_space_length:

+-------+-------+------+
       S============E...D

> +
> +    0:  (default)  zlib <http://zlib.net/>:
> +            Byte  0 -  x:     the compressed data content
> +                              all the space provided used for compressed data

For zlib, we have byte 0-E are compressed data, and bytes (E+1)-D (if 
any) are ignored.  There is no way to tell how many bytes between E and 
D exist, because zlib doesn't care (the compression stream itself 
ensures that decompression stops when input reaches E because the output 
reached a cluster boundary at that point).

> +    1:  zstd <http://github.com/facebook/zstd>:
> +            Byte  0 -  3:     the length of compressed data in bytes
> +                  4 -  x:     the compressed data content

Whereas for zstd, the decompression MUST know the actual location of E, 
rather than passing in the slop between E and D; bytes 0-3 give us that 
information.

But your description is not very accurate:  if 'x' is point E, then it 
is NOT data_space_length - 1, but rather data_space_length - slop, where 
slop can be up to 511 bytes (the number of bytes from (E+1) to D).  And 
if 'x' is point E, then the real layout for zlib is:

byte 0 - E: the compressed data content
byte E+1 - x: ignored slop (E is implied solely by the compressed data)

and for zstd is:

byte 0 - 3: the length of the compressed data
byte 4 - E: the compressed data (E computed from byte 0-3)
byte E+1 - x: ignored

I'm not sure what the best way is to document this.

> +++ b/block/qcow2-threads.c

> +static ssize_t qcow2_zstd_compress(void *dest, size_t dest_size,
> +                                   const void *src, size_t src_size)
> +{
> +    size_t ret;
> +
> +    /*
> +     * steal ZSTD_LEN_BUF bytes in the very beginning of the buffer
> +     * to store compressed chunk size
> +     */
> +    char *d_buf = ((char *) dest) + ZSTD_LEN_BUF;
> +
> +    /*
> +     * sanity check that we can store the compressed data length,
> +     * and there is some space left for the compressor buffer
> +     */
> +    if (dest_size <= ZSTD_LEN_BUF) {
> +        return -ENOMEM;
> +    }
> +
> +    dest_size -= ZSTD_LEN_BUF;
> +
> +    ret = ZSTD_compress(d_buf, dest_size, src, src_size, 5);

Where does the magic number 5 come from?

> +
> +    if (ZSTD_isError(ret)) {
> +        if (ZSTD_getErrorCode(ret) == ZSTD_error_dstSize_tooSmall) {
> +            return -ENOMEM;
> +        } else {
> +            return -EIO;
> +        }
> +    }
> +
> +    /*
> +     * paranoid sanity check that we can store
> +     * the compressed size in the first 4 bytes
> +     */
> +    if (ret > UINT32_MAX) {
> +        return -ENOMEM;
> +    }

The if is awkward.  I'd prefer to change this to:

     /*
      * Our largest cluster is 2M, and we insist that compression
      * actually compressed things.
      */
     assert(ret < UINT32_MAX);

or even tighten to assert(ret <= dest_size)

> +
> +    /* store the compressed chunk size in the very beginning of the buffer */
> +    stl_be_p(dest, ret);
> +
> +    return ret + ZSTD_LEN_BUF;
> +}
> +
> +/*
> + * qcow2_zstd_decompress()
> + *
> + * Decompress some data (not more than @src_size bytes) to produce exactly
> + * @dest_size bytes using zstd compression method
> + *
> + * @dest - destination buffer, @dest_size bytes
> + * @src - source buffer, @src_size bytes
> + *
> + * Returns: 0 on success
> + *          -EIO on any error
> + */
> +static ssize_t qcow2_zstd_decompress(void *dest, size_t dest_size,
> +                                     const void *src, size_t src_size)
> +{
> +    /*
> +     * zstd decompress wants to know the exact length of the data.
> +     * For that purpose, on compression, the length is stored in
> +     * the very beginning of the compressed buffer
> +     */
> +    size_t s_size;
> +    const char *s_buf = ((const char *) src) + ZSTD_LEN_BUF;
> +
> +    /*
> +     * sanity check that we can read 4 byte the content length and
> +     * and there is some content to decompress
> +     */
> +    if (src_size <= ZSTD_LEN_BUF) {
> +        return -EIO;
> +    }
> +
> +    s_size = ldl_be_p(src);
> +
> +    /* sanity check that the buffer is big enough to read the content from */
> +    if (src_size - ZSTD_LEN_BUF < s_size) {
> +        return -EIO;
> +    }
> +
> +    if (ZSTD_isError(
> +            ZSTD_decompress(dest, dest_size, s_buf, s_size))) {

You are correct that ZSTD_decompress() is picky that it must be given 
the exact size of the compressed buffer it is decompressing.  But the 
ZSTD manual mentions that if an exact size is not known in advance, that 
the streaming API can be used instead:

https://facebook.github.io/zstd/zstd_manual.html#Chapter9

In other words, would it be possible to NOT have to prepend four bytes 
of exact size information, by instead setting up decompression via the 
streaming API where the input is (usually) oversized, but the output 
buffer limited to exactly one cluster is sufficient to consume the exact 
compressed data and ignore the slop, just as we do in zlib?

The rest of this patch looks okay.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


Re: [PATCH v6 3/4] qcow2: add zstd cluster compression
Posted by Denis Plotnikov 5 years, 8 months ago

On 16.03.2020 17:01, Eric Blake wrote:
> On 3/12/20 4:22 AM, Denis Plotnikov wrote:
>> zstd significantly reduces cluster compression time.
>> It provides better compression performance maintaining
>> the same level of the compression ratio in comparison with
>> zlib, which, at the moment, is the only compression
>> method available.
>>
>
>> +++ b/docs/interop/qcow2.txt
>> @@ -208,6 +208,7 @@ version 2.
>>                         Available compression type values:
>>                           0: zlib <https://www.zlib.net/>
>> +                        1: zstd <http://github.com/facebook/zstd>
>>       === Header padding ===
>> @@ -575,11 +576,30 @@ Compressed Clusters Descriptor (x = 62 - 
>> (cluster_bits - 8)):
>>                       Another compressed cluster may map to the tail 
>> of the final
>>                       sector used by this compressed cluster.
>>   +                    The layout of the compressed data depends on 
>> the compression
>> +                    type used for the image (see compressed cluster 
>> layout).
>> +
>>   If a cluster is unallocated, read requests shall read the data from 
>> the backing
>>   file (except if bit 0 in the Standard Cluster Descriptor is set). 
>> If there is
>>   no backing file or the backing file is smaller than the image, they 
>> shall read
>>   zeros for all parts that are not covered by the backing file.
>>   +=== Compressed Cluster Layout ===
>> +
>> +The compressed cluster data has a layout depending on the compression
>> +type used for the image, as follows:
>> +
>> +Compressed data layout for the available compression types:
>> +data_space_lenght - data chunk length available to store a 
>> compressed cluster.
>
> length
>
>> +(for more details see "Compressed Clusters Descriptor")
>> +x = data_space_length - 1
>
> If I understand correctly, data_space_length is really an upper bounds 
> on the length available, because it is computed by rounding UP to the 
> next 512-byte boundary (that is, the L2 descriptor lists the number of 
> additional sectors used in storing the compressed data).  Which really 
> means that we have the following, where + is cluster boundaries, S and 
> E are the start and end of the compressed data, and D is the offset 
> determined by data_space_length:
>
> +-------+-------+------+
>       S============E...D
>
>> +
>> +    0:  (default)  zlib <http://zlib.net/>:
>> +            Byte  0 -  x:     the compressed data content
>> +                              all the space provided used for 
>> compressed data
>
> For zlib, we have byte 0-E are compressed data, and bytes (E+1)-D (if 
> any) are ignored.  There is no way to tell how many bytes between E 
> and D exist, because zlib doesn't care (the compression stream itself 
> ensures that decompression stops when input reaches E because the 
> output reached a cluster boundary at that point).
>
>> +    1:  zstd <http://github.com/facebook/zstd>:
>> +            Byte  0 -  3:     the length of compressed data in bytes
>> +                  4 -  x:     the compressed data content
>
> Whereas for zstd, the decompression MUST know the actual location of 
> E, rather than passing in the slop between E and D; bytes 0-3 give us 
> that information.
>
> But your description is not very accurate:  if 'x' is point E, then it 
> is NOT data_space_length - 1, but rather data_space_length - slop, 
> where slop can be up to 511 bytes (the number of bytes from (E+1) to 
> D).  And if 'x' is point E, then the real layout for zlib is:
>
> byte 0 - E: the compressed data content
> byte E+1 - x: ignored slop (E is implied solely by the compressed data)
>
> and for zstd is:
>
> byte 0 - 3: the length of the compressed data
> byte 4 - E: the compressed data (E computed from byte 0-3)
> byte E+1 - x: ignored
>
> I'm not sure what the best way is to document this.
>
>> +++ b/block/qcow2-threads.c
>
>> +static ssize_t qcow2_zstd_compress(void *dest, size_t dest_size,
>> +                                   const void *src, size_t src_size)
>> +{
>> +    size_t ret;
>> +
>> +    /*
>> +     * steal ZSTD_LEN_BUF bytes in the very beginning of the buffer
>> +     * to store compressed chunk size
>> +     */
>> +    char *d_buf = ((char *) dest) + ZSTD_LEN_BUF;
>> +
>> +    /*
>> +     * sanity check that we can store the compressed data length,
>> +     * and there is some space left for the compressor buffer
>> +     */
>> +    if (dest_size <= ZSTD_LEN_BUF) {
>> +        return -ENOMEM;
>> +    }
>> +
>> +    dest_size -= ZSTD_LEN_BUF;
>> +
>> +    ret = ZSTD_compress(d_buf, dest_size, src, src_size, 5);
>
> Where does the magic number 5 come from?
I did some tests to get the same compression ratio as zlib but do it 
faster than zlib.
ZLIB also used hardcoded "compression ratio". Changing of the 
compression ratios in both compression types is something that can be 
changed with later patches.
>
>
>> +
>> +    if (ZSTD_isError(ret)) {
>> +        if (ZSTD_getErrorCode(ret) == ZSTD_error_dstSize_tooSmall) {
>> +            return -ENOMEM;
>> +        } else {
>> +            return -EIO;
>> +        }
>> +    }
>> +
>> +    /*
>> +     * paranoid sanity check that we can store
>> +     * the compressed size in the first 4 bytes
>> +     */
>> +    if (ret > UINT32_MAX) {
>> +        return -ENOMEM;
>> +    }
>
> The if is awkward.  I'd prefer to change this to:
>
>     /*
>      * Our largest cluster is 2M, and we insist that compression
>      * actually compressed things.
>      */
>     assert(ret < UINT32_MAX);
>
> or even tighten to assert(ret <= dest_size)
>
>> +
>> +    /* store the compressed chunk size in the very beginning of the 
>> buffer */
>> +    stl_be_p(dest, ret);
>> +
>> +    return ret + ZSTD_LEN_BUF;
>> +}
>> +
>> +/*
>> + * qcow2_zstd_decompress()
>> + *
>> + * Decompress some data (not more than @src_size bytes) to produce 
>> exactly
>> + * @dest_size bytes using zstd compression method
>> + *
>> + * @dest - destination buffer, @dest_size bytes
>> + * @src - source buffer, @src_size bytes
>> + *
>> + * Returns: 0 on success
>> + *          -EIO on any error
>> + */
>> +static ssize_t qcow2_zstd_decompress(void *dest, size_t dest_size,
>> +                                     const void *src, size_t src_size)
>> +{
>> +    /*
>> +     * zstd decompress wants to know the exact length of the data.
>> +     * For that purpose, on compression, the length is stored in
>> +     * the very beginning of the compressed buffer
>> +     */
>> +    size_t s_size;
>> +    const char *s_buf = ((const char *) src) + ZSTD_LEN_BUF;
>> +
>> +    /*
>> +     * sanity check that we can read 4 byte the content length and
>> +     * and there is some content to decompress
>> +     */
>> +    if (src_size <= ZSTD_LEN_BUF) {
>> +        return -EIO;
>> +    }
>> +
>> +    s_size = ldl_be_p(src);
>> +
>> +    /* sanity check that the buffer is big enough to read the 
>> content from */
>> +    if (src_size - ZSTD_LEN_BUF < s_size) {
>> +        return -EIO;
>> +    }
>> +
>> +    if (ZSTD_isError(
>> +            ZSTD_decompress(dest, dest_size, s_buf, s_size))) {
>
> You are correct that ZSTD_decompress() is picky that it must be given 
> the exact size of the compressed buffer it is decompressing.  But the 
> ZSTD manual mentions that if an exact size is not known in advance, 
> that the streaming API can be used instead:
>
> https://facebook.github.io/zstd/zstd_manual.html#Chapter9
To be honest, I didn't find where they mentioned that explicitly. Could 
you please point where exactly?

But I found the following:

   Calling ZSTD_compressStream2() with ZSTD_e_end instructs to finish a frame.
   It will perform a flush and write frame epilogue.
   The epilogue is required for decoders to consider a frame completed.
   flush operation is the same, and follows same rules as calling ZSTD_compressStream2() with ZSTD_e_flush.
   You must continue calling ZSTD_compressStream2() with ZSTD_e_end until it returns 0, at which point you are free to
   start a new frame

I think in the epilogue they store the same information that I did and potentially (I didn't check) some more to finish the frame.
So we didn't win any space. Additionally, using streaming API will make the code more complex.

So I decided to stick with more simple version.

>
> In other words, would it be possible to NOT have to prepend four bytes 
> of exact size information, by instead setting up decompression via the 
> streaming API where the input is (usually) oversized, but the output 
> buffer limited to exactly one cluster is sufficient to consume the 
> exact compressed data and ignore the slop, just as we do in zlib?
>
> The rest of this patch looks okay.
>