From nobody Sun Feb  8 03:06:19 2026
Received: from mail-qk1-f176.google.com (mail-qk1-f176.google.com
 [209.85.222.176])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id E591045948
	for <linux-kernel@vger.kernel.org>; Thu, 13 Mar 2025 20:45:21 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.222.176
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1741898757; cv=none;
 b=JxdASkiKDLwXePFEiwLDHWi0qn0HNVLHveeRIcf1jeACIp/9rQwsktBecjRwgK69hhWbMZ0l+tKfqRu7k2olhQAF+INMtL9ph8gfS2GulUXeN7QLM4nIExdCRpP2dD2W4DNN7L2wQGZDnreTJZBXpnD+dmN4UHPqjtQ2yYYaXfI=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1741898757; c=relaxed/simple;
	bh=VleBbQSVvzfecY5Z0zRYCc/lxNaqdlR3BU0V4Kwu3Yg=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version:Content-Type;
 b=HA6nkAdblcA5RxjRkzP/IT9MOI67aXd0d3nehrZHIkWtkwcJVQtGXV+kn0fADVrEimJ0mhHbKSFWDQSL3cNLRmmsiKWcWm+d+wMnp8qfSBA7VIoMGzGu7HD4jMqQDwnbbHp7xP5aURbrLuTlx08L/Jg/OHtuJSadwEZwF8H2DIE=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=Qsx0VOy6; arc=none smtp.client-ip=209.85.222.176
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="Qsx0VOy6"
Received: by mail-qk1-f176.google.com with SMTP id
 af79cd13be357-7c0818add57so159926485a.3
        for <linux-kernel@vger.kernel.org>;
 Thu, 13 Mar 2025 13:45:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1741898720; x=1742503520;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=i99pn8RCKckV8A3EJndGrowMmmSHYimKx6PScVavN30=;
        b=Qsx0VOy6Jqm6IN81/XszFhzeOvTFEF+SW1HV/2uHf5I9cWtDCGtMTmwhwdxENMCDH7
         x0irQ66IfPJ9H4++55RdxWvtRqjAYvAHfdKX3SNGeBcBMURsRQOPgjyN9Hco0UDsRoKG
         hiZ1TdjwEeIS/UDO6KSZIo8X3OPJk+OgPro4pRmfFNhIsTet4VwbQhW4Hr6/argFZ6NQ
         UPrZ49xiUGze+qAcuPL5ulO+F7pYqgwD8WLsFIqXWeMMZZ1GupjjwdTo55RWDhor95Ou
         lAYVmMD6VTYUl4zmTHdQ77BV9lUJiY/xRGMi31OIMWzNtGw3FXdZ8sCMX431sFbJXkLr
         1LUg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1741898720; x=1742503520;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=i99pn8RCKckV8A3EJndGrowMmmSHYimKx6PScVavN30=;
        b=NoiIpttGahN2SsfT07N+aJeSHK3ErBSWnY5uLLs2ve6Z0VRREDHjJAn78CgLMAnxNn
         MyTBuu7B4BGIHdpELesPOJ5qofsChv/RGb6g5a9m2scca4ERjQFhxLdqIUH6ugbyHM3N
         LVQ6fPMahnFlrJR8CL2DUAfj1ux5xRM/UwIGHPI0D7X6S2Ywu209KqXcVzcJjKN8Z3Qq
         +Xgsb/1mVaRztHqEQTUgVf+7hFJtMBn24WQ1iq0UA731Ef3TS3g8nHrb+HFdGXUwxGOg
         m+BOby+On6s3/j0/cPh5n85ae6o7/vkWCJsDrSWJ8pnI1R/pKfo9PBpShSIirSiYVgAj
         cETA==
X-Gm-Message-State: AOJu0YzSSfeVvn72808f3RLXlEmuB7IEN9wF9TyYkB97ubc5aVKpbm52
	JdKaqRRZJuu5B++3k3JeENGSlgQ977ccNAUdr7+JzBCkmOkQ3Pol4UfzHA==
X-Gm-Gg: ASbGncukvjRvfs3P8fyatQ5/SZTIcms4kVI8ywq2OTcG0PKnKJz+0iFMhhgkn5CJLFU
	KRG+JlhxEHAVbEp2l6DmLY5RSj6f1vdT+nybmBV1x6Hp/9NNaAs/yTgtIYO6bbiPFCnGQBxQk5y
	JSWqOqSrIPuCPJPy1ydHnfWMhK5FwalFBVYt016qQJNTqvZPcI5XhcGFLOEKwxu6KwLfmtdO207
	AVu1nsZHBg8MPjwy986E/OdhlkzvycbzdmYTzMk0t2wZWdUiYcJ++XHIyTCz8PwqKAKZcUb9QJg
	7q5hyPp/VKipzdOl6QFYplzXS9Mg+pk5iYw+c4oqTHYKTkCth1x4+XPx/EAlNJYYhOXOKbQRxHm
	PDEtVC1ioDlVetaHiVpW2OdklB2kPvPk7ag==
X-Google-Smtp-Source: 
 AGHT+IErBbsOg94qtoKSvaxU2QZSrLmRvcY/UF2QU8MXZyWIHKnqDNVwSBmiqwEXoS5L9o4jqQA6hA==
X-Received: by 2002:a05:620a:6008:b0:7c5:444e:3f56 with SMTP id
 af79cd13be357-7c55e843b8amr2176246085a.1.1741898718162;
        Thu, 13 Mar 2025 13:45:18 -0700 (PDT)
Received: from nickserv.taila7d40.ts.net
 (pool-173-52-224-122.nycmny.fios.verizon.net. [173.52.224.122])
        by smtp.gmail.com with ESMTPSA id
 af79cd13be357-7c573c714cbsm141348585a.30.2025.03.13.13.45.16
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 13 Mar 2025 13:45:17 -0700 (PDT)
From: Nick Terrell <nickrterrell@gmail.com>
To: linux-kernel@vger.kernel.org
Cc: Nick Terrell <terrelln@fb.com>,
	Nick Terrell <nickrterrell@gmail.com>,
	Kernel Team <Kernel-team@fb.com>,
	David Sterba <dsterba@suse.com>
Subject: [PATCH 1/1] zstd: Import upstream v1.5.7
Date: Thu, 13 Mar 2025 13:59:21 -0700
Message-ID: <20250313205923.4105088-2-nickrterrell@gmail.com>
X-Mailer: git-send-email 2.48.1
In-Reply-To: <20250313205923.4105088-1-nickrterrell@gmail.com>
References: <20250313205923.4105088-1-nickrterrell@gmail.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable

From: Nick Terrell <terrelln@fb.com>

In addition to keeping the kernel's copy of zstd up to date, this update
was requested by Intel to expose upstream's APIs that allow QAT to accelera=
te
the LZ match finding stage of Zstd.

This patch is imported from the upstream tag v1.5.7-kernel [0], which is si=
gned
with upstream's signing key EF8FE99528B52FFD [1]. It was imported from upst=
ream
using this command:

  export ZSTD=3D/path/to/repo/zstd/
  export LINUX=3D/path/to/repo/linux/
  cd "$ZSTD/contrib/linux-kernel"
  git checkout v1.5.7-kernel
  make import LINUX=3D"$LINUX"

This patch has been tested on x86-64, and has been boot tested with
a zstd compressed kernel & initramfs on i386 and aarch64. I benchmarked
the patch on x86-64 with gcc-14.2.1 on an Intel i9-9900K by measruing the
performance of compressed filesystem reads and writes.

Component,  Level,  Size delta, C. time delta,  D. time delta
Btrfs    ,      1,      +0.00%,         -6.1%,          +1.4%
Btrfs    ,      3,      +0.00%,         -9.8%,          +3.0%
Btrfs    ,      5,      +0.00%,         +1.7%,          +1.4%
Btrfs    ,      7,      +0.00%,         -1.9%,          +2.7%
Btrfs    ,      9,      +0.00%,         -3.4%,          +3.7%
Btrfs    ,     15,      +0.00%,         -0.3%,          +3.6%
SquashFS ,      1,      +0.00%,           N/A,          +1.9%

The major changes that impact the kernel use cases for each version are:

v1.5.7: https://github.com/facebook/zstd/releases/tag/v1.5.7
* Add zstd_compress_sequences_and_literals() for use by Intel's QAT driver
  to implement Zstd compression acceleration in the kernel.
* Fix an underflow bug in 32-bit builds that can cause data corruption when
  processing more than 4GB of data with a single `ZSTD_CCtx` object, when an
  input crosses the 4GB boundry. I don't believe this impacts any current k=
ernel
  use cases, because the `ZSTD_CCtx` is typically reconstructed between
  compressions.
* Levels 1-4 see 5-10% compression speed improvements for inputs smaller th=
an
  128KB.

v1.5.6: https://github.com/facebook/zstd/releases/tag/v1.5.6
* Improved compression ratio for the highest compression levels. I don't ex=
pect
  these see much use however, due to their slow speeds.

v1.5.5: https://github.com/facebook/zstd/releases/tag/v1.5.5
* Fix a rare corruption bug that can trigger on levels 13 and above.
* Improve compression speed of levels 5-11 on incompressible data.

v1.5.4: https://github.com/facebook/zstd/releases/tag/v1.5.4
* Improve copmression speed of levels 5-11 on ARM.
* Improve dictionary compression speed.

Signed-off-by: Nick Terrell <terrelln@fb.com>
---
 include/linux/zstd.h                          |   87 +-
 include/linux/zstd_errors.h                   |   30 +-
 include/linux/zstd_lib.h                      | 1123 ++++--
 lib/zstd/Makefile                             |    3 +-
 lib/zstd/common/allocations.h                 |   56 +
 lib/zstd/common/bits.h                        |  150 +
 lib/zstd/common/bitstream.h                   |  155 +-
 lib/zstd/common/compiler.h                    |  151 +-
 lib/zstd/common/cpu.h                         |    3 +-
 lib/zstd/common/debug.c                       |    9 +-
 lib/zstd/common/debug.h                       |   37 +-
 lib/zstd/common/entropy_common.c              |   42 +-
 lib/zstd/common/error_private.c               |   13 +-
 lib/zstd/common/error_private.h               |   88 +-
 lib/zstd/common/fse.h                         |  103 +-
 lib/zstd/common/fse_decompress.c              |  132 +-
 lib/zstd/common/huf.h                         |  240 +-
 lib/zstd/common/mem.h                         |    3 +-
 lib/zstd/common/portability_macros.h          |   45 +-
 lib/zstd/common/zstd_common.c                 |   38 +-
 lib/zstd/common/zstd_deps.h                   |   16 +-
 lib/zstd/common/zstd_internal.h               |  153 +-
 lib/zstd/compress/clevels.h                   |    3 +-
 lib/zstd/compress/fse_compress.c              |   74 +-
 lib/zstd/compress/hist.c                      |   13 +-
 lib/zstd/compress/hist.h                      |   10 +-
 lib/zstd/compress/huf_compress.c              |  441 ++-
 lib/zstd/compress/zstd_compress.c             | 3293 ++++++++++++-----
 lib/zstd/compress/zstd_compress_internal.h    |  621 +++-
 lib/zstd/compress/zstd_compress_literals.c    |  157 +-
 lib/zstd/compress/zstd_compress_literals.h    |   25 +-
 lib/zstd/compress/zstd_compress_sequences.c   |   21 +-
 lib/zstd/compress/zstd_compress_sequences.h   |   16 +-
 lib/zstd/compress/zstd_compress_superblock.c  |  394 +-
 lib/zstd/compress/zstd_compress_superblock.h  |    3 +-
 lib/zstd/compress/zstd_cwksp.h                |  222 +-
 lib/zstd/compress/zstd_double_fast.c          |  245 +-
 lib/zstd/compress/zstd_double_fast.h          |   27 +-
 lib/zstd/compress/zstd_fast.c                 |  703 +++-
 lib/zstd/compress/zstd_fast.h                 |   16 +-
 lib/zstd/compress/zstd_lazy.c                 |  840 +++--
 lib/zstd/compress/zstd_lazy.h                 |  195 +-
 lib/zstd/compress/zstd_ldm.c                  |  102 +-
 lib/zstd/compress/zstd_ldm.h                  |   17 +-
 lib/zstd/compress/zstd_ldm_geartab.h          |    3 +-
 lib/zstd/compress/zstd_opt.c                  |  571 +--
 lib/zstd/compress/zstd_opt.h                  |   55 +-
 lib/zstd/compress/zstd_preSplit.c             |  239 ++
 lib/zstd/compress/zstd_preSplit.h             |   34 +
 lib/zstd/decompress/huf_decompress.c          |  887 +++--
 lib/zstd/decompress/zstd_ddict.c              |    9 +-
 lib/zstd/decompress/zstd_ddict.h              |    3 +-
 lib/zstd/decompress/zstd_decompress.c         |  375 +-
 lib/zstd/decompress/zstd_decompress_block.c   |  724 ++--
 lib/zstd/decompress/zstd_decompress_block.h   |   10 +-
 .../decompress/zstd_decompress_internal.h     |   19 +-
 lib/zstd/decompress_sources.h                 |    2 +-
 lib/zstd/zstd_common_module.c                 |    5 +-
 lib/zstd/zstd_compress_module.c               |   75 +-
 lib/zstd/zstd_decompress_module.c             |    4 +-
 60 files changed, 8749 insertions(+), 4381 deletions(-)
 create mode 100644 lib/zstd/common/allocations.h
 create mode 100644 lib/zstd/common/bits.h
 create mode 100644 lib/zstd/compress/zstd_preSplit.c
 create mode 100644 lib/zstd/compress/zstd_preSplit.h

diff --git a/include/linux/zstd.h b/include/linux/zstd.h
index b2c7cf310c8f..2f2a3c8b8a33 100644
--- a/include/linux/zstd.h
+++ b/include/linux/zstd.h
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -160,7 +160,6 @@ typedef ZSTD_parameters zstd_parameters;
 zstd_parameters zstd_get_params(int level,
 	unsigned long long estimated_src_size);
=20
-
 /**
  * zstd_get_cparams() - returns zstd_compression_parameters for selected l=
evel
  * @level:              The compression level
@@ -173,9 +172,20 @@ zstd_parameters zstd_get_params(int level,
 zstd_compression_parameters zstd_get_cparams(int level,
 	unsigned long long estimated_src_size, size_t dict_size);
=20
-/* =3D=3D=3D=3D=3D=3D   Single-pass Compression   =3D=3D=3D=3D=3D=3D */
-
 typedef ZSTD_CCtx zstd_cctx;
+typedef ZSTD_cParameter zstd_cparameter;
+
+/**
+ * zstd_cctx_set_param() - sets a compression parameter
+ * @cctx:         The context. Must have been initialized with zstd_init_c=
ctx().
+ * @param:        The parameter to set.
+ * @value:        The value to set the parameter to.
+ *
+ * Return:        Zero or an error, which can be checked using zstd_is_err=
or().
+ */
+size_t zstd_cctx_set_param(zstd_cctx *cctx, zstd_cparameter param, int val=
ue);
+
+/* =3D=3D=3D=3D=3D=3D   Single-pass Compression   =3D=3D=3D=3D=3D=3D */
=20
 /**
  * zstd_cctx_workspace_bound() - max memory needed to initialize a zstd_cc=
tx
@@ -190,6 +200,20 @@ typedef ZSTD_CCtx zstd_cctx;
  */
 size_t zstd_cctx_workspace_bound(const zstd_compression_parameters *parame=
ters);
=20
+/**
+ * zstd_cctx_workspace_bound_with_ext_seq_prod() - max memory needed to
+ * initialize a zstd_cctx when using the block-level external sequence
+ * producer API.
+ * @parameters: The compression parameters to be used.
+ *
+ * If multiple compression parameters might be used, the caller must call
+ * this function for each set of parameters and use the maximum size.
+ *
+ * Return:      A lower bound on the size of the workspace that is passed =
to
+ *              zstd_init_cctx().
+ */
+size_t zstd_cctx_workspace_bound_with_ext_seq_prod(const zstd_compression_=
parameters *parameters);
+
 /**
  * zstd_init_cctx() - initialize a zstd compression context
  * @workspace:      The workspace to emplace the context into. It must out=
live
@@ -424,6 +448,16 @@ typedef ZSTD_CStream zstd_cstream;
  */
 size_t zstd_cstream_workspace_bound(const zstd_compression_parameters *cpa=
rams);
=20
+/**
+ * zstd_cstream_workspace_bound_with_ext_seq_prod() - memory needed to ini=
tialize
+ * a zstd_cstream when using the block-level external sequence producer AP=
I.
+ * @cparams: The compression parameters to be used for compression.
+ *
+ * Return:   A lower bound on the size of the workspace that is passed to
+ *           zstd_init_cstream().
+ */
+size_t zstd_cstream_workspace_bound_with_ext_seq_prod(const zstd_compressi=
on_parameters *cparams);
+
 /**
  * zstd_init_cstream() - initialize a zstd streaming compression context
  * @parameters        The zstd parameters to use for compression.
@@ -583,6 +617,18 @@ size_t zstd_decompress_stream(zstd_dstream *dstream, z=
std_out_buffer *output,
  */
 size_t zstd_find_frame_compressed_size(const void *src, size_t src_size);
=20
+/**
+ * zstd_register_sequence_producer() - exposes the zstd library function
+ * ZSTD_registerSequenceProducer(). This is used for the block-level exter=
nal
+ * sequence producer API. See upstream zstd.h for detailed documentation.
+ */
+typedef ZSTD_sequenceProducer_F zstd_sequence_producer_f;
+void zstd_register_sequence_producer(
+  zstd_cctx *cctx,
+  void* sequence_producer_state,
+  zstd_sequence_producer_f sequence_producer
+);
+
 /**
  * struct zstd_frame_params - zstd frame parameters stored in the frame he=
ader
  * @frameContentSize: The frame content size, or ZSTD_CONTENTSIZE_UNKNOWN =
if not
@@ -596,7 +642,7 @@ size_t zstd_find_frame_compressed_size(const void *src,=
 size_t src_size);
  *
  * See zstd_lib.h.
  */
-typedef ZSTD_frameHeader zstd_frame_header;
+typedef ZSTD_FrameHeader zstd_frame_header;
=20
 /**
  * zstd_get_frame_header() - extracts parameters from a zstd or skippable =
frame
@@ -611,4 +657,35 @@ typedef ZSTD_frameHeader zstd_frame_header;
 size_t zstd_get_frame_header(zstd_frame_header *params, const void *src,
 	size_t src_size);
=20
+/**
+ * struct zstd_sequence - a sequence of literals or a match
+ *
+ * @offset: The offset of the match
+ * @litLength: The literal length of the sequence
+ * @matchLength: The match length of the sequence
+ * @rep: Represents which repeat offset is used
+ */
+typedef ZSTD_Sequence zstd_sequence;
+
+/**
+ * zstd_compress_sequences_and_literals() - compress an array of zstd_sequ=
ence and literals
+ *
+ * @cctx: The zstd compression context.
+ * @dst: The buffer to compress the data into.
+ * @dst_capacity: The size of the destination buffer.
+ * @in_seqs: The array of zstd_sequence to compress.
+ * @in_seqs_size: The number of sequences in in_seqs.
+ * @literals: The literals associated to the sequences to be compressed.
+ * @lit_size: The size of the literals in the literals buffer.
+ * @lit_capacity: The size of the literals buffer.
+ * @decompressed_size: The size of the input data
+ *
+ * Return: The compressed size or an error, which can be checked using
+ * 	   zstd_is_error().
+ */
+size_t zstd_compress_sequences_and_literals(zstd_cctx *cctx, void* dst, si=
ze_t dst_capacity,
+					    const zstd_sequence *in_seqs, size_t in_seqs_size,
+					    const void* literals, size_t lit_size, size_t lit_capacity,
+					    size_t decompressed_size);
+
 #endif  /* LINUX_ZSTD_H */
diff --git a/include/linux/zstd_errors.h b/include/linux/zstd_errors.h
index 58b6dd45a969..c307fb011132 100644
--- a/include/linux/zstd_errors.h
+++ b/include/linux/zstd_errors.h
@@ -1,5 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -12,13 +13,18 @@
 #define ZSTD_ERRORS_H_398273423
=20
=20
-/*=3D=3D=3D=3D=3D dependency =3D=3D=3D=3D=3D*/
-#include <linux/types.h>   /* size_t */
+/* =3D=3D=3D=3D=3D   ZSTDERRORLIB_API : control library symbols visibility=
   =3D=3D=3D=3D=3D */
+#define ZSTDERRORLIB_VISIBLE=20
=20
+#ifndef ZSTDERRORLIB_HIDDEN
+#  if (__GNUC__ >=3D 4) && !defined(__MINGW32__)
+#    define ZSTDERRORLIB_HIDDEN __attribute__ ((visibility ("hidden")))
+#  else
+#    define ZSTDERRORLIB_HIDDEN
+#  endif
+#endif
=20
-/* =3D=3D=3D=3D=3D   ZSTDERRORLIB_API : control library symbols visibility=
   =3D=3D=3D=3D=3D */
-#define ZSTDERRORLIB_VISIBILITY=20
-#define ZSTDERRORLIB_API ZSTDERRORLIB_VISIBILITY
+#define ZSTDERRORLIB_API ZSTDERRORLIB_VISIBLE
=20
 /*-*********************************************
  *  Error codes list
@@ -43,14 +49,18 @@ typedef enum {
   ZSTD_error_frameParameter_windowTooLarge =3D 16,
   ZSTD_error_corruption_detected =3D 20,
   ZSTD_error_checksum_wrong      =3D 22,
+  ZSTD_error_literals_headerWrong =3D 24,
   ZSTD_error_dictionary_corrupted      =3D 30,
   ZSTD_error_dictionary_wrong          =3D 32,
   ZSTD_error_dictionaryCreation_failed =3D 34,
   ZSTD_error_parameter_unsupported   =3D 40,
+  ZSTD_error_parameter_combination_unsupported =3D 41,
   ZSTD_error_parameter_outOfBound    =3D 42,
   ZSTD_error_tableLog_tooLarge       =3D 44,
   ZSTD_error_maxSymbolValue_tooLarge =3D 46,
   ZSTD_error_maxSymbolValue_tooSmall =3D 48,
+  ZSTD_error_cannotProduce_uncompressedBlock =3D 49,
+  ZSTD_error_stabilityCondition_notRespected =3D 50,
   ZSTD_error_stage_wrong       =3D 60,
   ZSTD_error_init_missing      =3D 62,
   ZSTD_error_memory_allocation =3D 64,
@@ -58,18 +68,18 @@ typedef enum {
   ZSTD_error_dstSize_tooSmall =3D 70,
   ZSTD_error_srcSize_wrong    =3D 72,
   ZSTD_error_dstBuffer_null   =3D 74,
+  ZSTD_error_noForwardProgress_destFull =3D 80,
+  ZSTD_error_noForwardProgress_inputEmpty =3D 82,
   /* following error codes are __NOT STABLE__, they can be removed or chan=
ged in future versions */
   ZSTD_error_frameIndex_tooLarge =3D 100,
   ZSTD_error_seekableIO          =3D 102,
   ZSTD_error_dstBuffer_wrong     =3D 104,
   ZSTD_error_srcBuffer_wrong     =3D 105,
+  ZSTD_error_sequenceProducer_failed =3D 106,
+  ZSTD_error_externalSequences_invalid =3D 107,
   ZSTD_error_maxCode =3D 120  /* never EVER use this value directly, it ca=
n change in future versions! Use ZSTD_isError() instead */
 } ZSTD_ErrorCode;
=20
-/*! ZSTD_getErrorCode() :
-    convert a `size_t` function result into a `ZSTD_ErrorCode` enum type,
-    which can be used to compare with enum list published above */
-ZSTDERRORLIB_API ZSTD_ErrorCode ZSTD_getErrorCode(size_t functionResult);
 ZSTDERRORLIB_API const char* ZSTD_getErrorString(ZSTD_ErrorCode code);   /=
*< Same as ZSTD_getErrorName, but using a `ZSTD_ErrorCode` enum argument */
=20
=20
diff --git a/include/linux/zstd_lib.h b/include/linux/zstd_lib.h
index 79d55465d5c1..e295d4125dde 100644
--- a/include/linux/zstd_lib.h
+++ b/include/linux/zstd_lib.h
@@ -1,5 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -11,23 +12,47 @@
 #ifndef ZSTD_H_235446
 #define ZSTD_H_235446
=20
-/* =3D=3D=3D=3D=3D=3D   Dependency   =3D=3D=3D=3D=3D=3D*/
-#include <linux/limits.h>   /* INT_MAX */
+
+/* =3D=3D=3D=3D=3D=3D   Dependencies   =3D=3D=3D=3D=3D=3D*/
 #include <linux/types.h>   /* size_t */
=20
+#include <linux/zstd_errors.h> /* list of errors */
+#if !defined(ZSTD_H_ZSTD_STATIC_LINKING_ONLY)
+#include <linux/limits.h>   /* INT_MAX */
+#endif /* ZSTD_STATIC_LINKING_ONLY */
+
=20
 /* =3D=3D=3D=3D=3D   ZSTDLIB_API : control library symbols visibility   =
=3D=3D=3D=3D=3D */
-#ifndef ZSTDLIB_VISIBLE
+#define ZSTDLIB_VISIBLE=20
+
+#ifndef ZSTDLIB_HIDDEN
 #  if (__GNUC__ >=3D 4) && !defined(__MINGW32__)
-#    define ZSTDLIB_VISIBLE __attribute__ ((visibility ("default")))
 #    define ZSTDLIB_HIDDEN __attribute__ ((visibility ("hidden")))
 #  else
-#    define ZSTDLIB_VISIBLE
 #    define ZSTDLIB_HIDDEN
 #  endif
 #endif
+
 #define ZSTDLIB_API ZSTDLIB_VISIBLE
=20
+/* Deprecation warnings :
+ * Should these warnings be a problem, it is generally possible to disable=
 them,
+ * typically with -Wno-deprecated-declarations for gcc or _CRT_SECURE_NO_W=
ARNINGS in Visual.
+ * Otherwise, it's also possible to define ZSTD_DISABLE_DEPRECATE_WARNINGS.
+ */
+#ifdef ZSTD_DISABLE_DEPRECATE_WARNINGS
+#  define ZSTD_DEPRECATED(message) /* disable deprecation warnings */
+#else
+#  if (defined(GNUC) && (GNUC > 4 || (GNUC =3D=3D 4 && GNUC_MINOR >=3D 5))=
) || defined(__clang__) || defined(__IAR_SYSTEMS_ICC__)
+#    define ZSTD_DEPRECATED(message) __attribute__((deprecated(message)))
+#  elif (__GNUC__ >=3D 3)
+#    define ZSTD_DEPRECATED(message) __attribute__((deprecated))
+#  else
+#    pragma message("WARNING: You need to implement ZSTD_DEPRECATED for th=
is compiler")
+#    define ZSTD_DEPRECATED(message)
+#  endif
+#endif /* ZSTD_DISABLE_DEPRECATE_WARNINGS */
+
=20
 /* ***********************************************************************=
******
   Introduction
@@ -65,7 +90,7 @@
 /*------   Version   ------*/
 #define ZSTD_VERSION_MAJOR    1
 #define ZSTD_VERSION_MINOR    5
-#define ZSTD_VERSION_RELEASE  2
+#define ZSTD_VERSION_RELEASE  7
 #define ZSTD_VERSION_NUMBER  (ZSTD_VERSION_MAJOR *100*100 + ZSTD_VERSION_M=
INOR *100 + ZSTD_VERSION_RELEASE)
=20
 /*! ZSTD_versionNumber() :
@@ -103,11 +128,12 @@ ZSTDLIB_API const char* ZSTD_versionString(void);
=20
=20
 /* *************************************
-*  Simple API
+*  Simple Core API
 ***************************************/
 /*! ZSTD_compress() :
  *  Compresses `src` content as a single zstd compressed frame into alread=
y allocated `dst`.
- *  Hint : compression runs faster if `dstCapacity` >=3D  `ZSTD_compressBo=
und(srcSize)`.
+ *  NOTE: Providing `dstCapacity >=3D ZSTD_compressBound(srcSize)` guarant=
ees that zstd will have
+ *        enough space to successfully compress the data.
  *  @return : compressed size written into `dst` (<=3D `dstCapacity),
  *            or an error code if it fails (which can be tested using ZSTD=
_isError()). */
 ZSTDLIB_API size_t ZSTD_compress( void* dst, size_t dstCapacity,
@@ -115,47 +141,55 @@ ZSTDLIB_API size_t ZSTD_compress( void* dst, size_t d=
stCapacity,
                                   int compressionLevel);
=20
 /*! ZSTD_decompress() :
- *  `compressedSize` : must be the _exact_ size of some number of compress=
ed and/or skippable frames.
- *  `dstCapacity` is an upper bound of originalSize to regenerate.
- *  If user cannot imply a maximum upper bound, it's better to use streami=
ng mode to decompress data.
- *  @return : the number of bytes decompressed into `dst` (<=3D `dstCapaci=
ty`),
- *            or an errorCode if it fails (which can be tested using ZSTD_=
isError()). */
+ * `compressedSize` : must be the _exact_ size of some number of compresse=
d and/or skippable frames.
+ *  Multiple compressed frames can be decompressed at once with this metho=
d.
+ *  The result will be the concatenation of all decompressed frames, back =
to back.
+ * `dstCapacity` is an upper bound of originalSize to regenerate.
+ *  First frame's decompressed size can be extracted using ZSTD_getFrameCo=
ntentSize().
+ *  If maximum upper bound isn't known, prefer using streaming mode to dec=
ompress data.
+ * @return : the number of bytes decompressed into `dst` (<=3D `dstCapacit=
y`),
+ *           or an errorCode if it fails (which can be tested using ZSTD_i=
sError()). */
 ZSTDLIB_API size_t ZSTD_decompress( void* dst, size_t dstCapacity,
                               const void* src, size_t compressedSize);
=20
+
+/*=3D=3D=3D=3D=3D=3D  Decompression helper functions  =3D=3D=3D=3D=3D=3D*/
+
 /*! ZSTD_getFrameContentSize() : requires v1.3.0+
- *  `src` should point to the start of a ZSTD encoded frame.
- *  `srcSize` must be at least as large as the frame header.
- *            hint : any size >=3D `ZSTD_frameHeaderSize_max` is large eno=
ugh.
- *  @return : - decompressed size of `src` frame content, if known
- *            - ZSTD_CONTENTSIZE_UNKNOWN if the size cannot be determined
- *            - ZSTD_CONTENTSIZE_ERROR if an error occurred (e.g. invalid =
magic number, srcSize too small)
- *   note 1 : a 0 return value means the frame is valid but "empty".
- *   note 2 : decompressed size is an optional field, it may not be presen=
t, typically in streaming mode.
- *            When `return=3D=3DZSTD_CONTENTSIZE_UNKNOWN`, data to decompr=
ess could be any size.
- *            In which case, it's necessary to use streaming mode to decom=
press data.
- *            Optionally, application can rely on some implicit limit,
- *            as ZSTD_decompress() only needs an upper bound of decompress=
ed size.
- *            (For example, data could be necessarily cut into blocks <=3D=
 16 KB).
- *   note 3 : decompressed size is always present when compression is comp=
leted using single-pass functions,
- *            such as ZSTD_compress(), ZSTD_compressCCtx() ZSTD_compress_u=
singDict() or ZSTD_compress_usingCDict().
- *   note 4 : decompressed size can be very large (64-bits value),
- *            potentially larger than what local system can handle as a si=
ngle memory segment.
- *            In which case, it's necessary to use streaming mode to decom=
press data.
- *   note 5 : If source is untrusted, decompressed size could be wrong or =
intentionally modified.
- *            Always ensure return value fits within application's authori=
zed limits.
- *            Each application can set its own limits.
- *   note 6 : This function replaces ZSTD_getDecompressedSize() */
+ * `src` should point to the start of a ZSTD encoded frame.
+ * `srcSize` must be at least as large as the frame header.
+ *           hint : any size >=3D `ZSTD_frameHeaderSize_max` is large enou=
gh.
+ * @return : - decompressed size of `src` frame content, if known
+ *           - ZSTD_CONTENTSIZE_UNKNOWN if the size cannot be determined
+ *           - ZSTD_CONTENTSIZE_ERROR if an error occurred (e.g. invalid m=
agic number, srcSize too small)
+ *  note 1 : a 0 return value means the frame is valid but "empty".
+ *           When invoking this method on a skippable frame, it will retur=
n 0.
+ *  note 2 : decompressed size is an optional field, it may not be present=
 (typically in streaming mode).
+ *           When `return=3D=3DZSTD_CONTENTSIZE_UNKNOWN`, data to decompre=
ss could be any size.
+ *           In which case, it's necessary to use streaming mode to decomp=
ress data.
+ *           Optionally, application can rely on some implicit limit,
+ *           as ZSTD_decompress() only needs an upper bound of decompresse=
d size.
+ *           (For example, data could be necessarily cut into blocks <=3D =
16 KB).
+ *  note 3 : decompressed size is always present when compression is compl=
eted using single-pass functions,
+ *           such as ZSTD_compress(), ZSTD_compressCCtx() ZSTD_compress_us=
ingDict() or ZSTD_compress_usingCDict().
+ *  note 4 : decompressed size can be very large (64-bits value),
+ *           potentially larger than what local system can handle as a sin=
gle memory segment.
+ *           In which case, it's necessary to use streaming mode to decomp=
ress data.
+ *  note 5 : If source is untrusted, decompressed size could be wrong or i=
ntentionally modified.
+ *           Always ensure return value fits within application's authoriz=
ed limits.
+ *           Each application can set its own limits.
+ *  note 6 : This function replaces ZSTD_getDecompressedSize() */
 #define ZSTD_CONTENTSIZE_UNKNOWN (0ULL - 1)
 #define ZSTD_CONTENTSIZE_ERROR   (0ULL - 2)
 ZSTDLIB_API unsigned long long ZSTD_getFrameContentSize(const void *src, s=
ize_t srcSize);
=20
-/*! ZSTD_getDecompressedSize() :
- *  NOTE: This function is now obsolete, in favor of ZSTD_getFrameContentS=
ize().
+/*! ZSTD_getDecompressedSize() (obsolete):
+ *  This function is now obsolete, in favor of ZSTD_getFrameContentSize().
  *  Both functions work the same way, but ZSTD_getDecompressedSize() blends
  *  "empty", "unknown" and "error" results to the same return value (0),
  *  while ZSTD_getFrameContentSize() gives them separate return values.
  * @return : decompressed size of `src` frame content _if known and not em=
pty_, 0 otherwise. */
+ZSTD_DEPRECATED("Replaced by ZSTD_getFrameContentSize")
 ZSTDLIB_API unsigned long long ZSTD_getDecompressedSize(const void* src, s=
ize_t srcSize);
=20
 /*! ZSTD_findFrameCompressedSize() : Requires v1.4.0+
@@ -163,18 +197,50 @@ ZSTDLIB_API unsigned long long ZSTD_getDecompressedSi=
ze(const void* src, size_t
  * `srcSize` must be >=3D first frame size
  * @return : the compressed size of the first frame starting at `src`,
  *           suitable to pass as `srcSize` to `ZSTD_decompress` or similar,
- *        or an error code if input is invalid */
+ *           or an error code if input is invalid
+ *  Note 1: this method is called _find*() because it's not enough to read=
 the header,
+ *          it may have to scan through the frame's content, to reach its =
end.
+ *  Note 2: this method also works with Skippable Frames. In which case,
+ *          it returns the size of the complete skippable frame,
+ *          which is always equal to its content size + 8 bytes for header=
s. */
 ZSTDLIB_API size_t ZSTD_findFrameCompressedSize(const void* src, size_t sr=
cSize);
=20
=20
-/*=3D=3D=3D=3D=3D=3D  Helper functions  =3D=3D=3D=3D=3D=3D*/
-#define ZSTD_COMPRESSBOUND(srcSize)   ((srcSize) + ((srcSize)>>8) + (((src=
Size) < (128<<10)) ? (((128<<10) - (srcSize)) >> 11) /* margin, from 64 to =
0 */ : 0))  /* this formula ensures that bound(A) + bound(B) <=3D bound(A+B=
) as long as A and B >=3D 128 KB */
-ZSTDLIB_API size_t      ZSTD_compressBound(size_t srcSize); /*!< maximum c=
ompressed size in worst case single-pass scenario */
-ZSTDLIB_API unsigned    ZSTD_isError(size_t code);          /*!< tells if =
a `size_t` function result is an error code */
-ZSTDLIB_API const char* ZSTD_getErrorName(size_t code);     /*!< provides =
readable string from an error code */
-ZSTDLIB_API int         ZSTD_minCLevel(void);               /*!< minimum n=
egative compression level allowed, requires v1.4.0+ */
-ZSTDLIB_API int         ZSTD_maxCLevel(void);               /*!< maximum c=
ompression level available */
-ZSTDLIB_API int         ZSTD_defaultCLevel(void);           /*!< default c=
ompression level, specified by ZSTD_CLEVEL_DEFAULT, requires v1.5.0+ */
+/*=3D=3D=3D=3D=3D=3D  Compression helper functions  =3D=3D=3D=3D=3D=3D*/
+
+/*! ZSTD_compressBound() :
+ * maximum compressed size in worst case single-pass scenario.
+ * When invoking `ZSTD_compress()`, or any other one-pass compression func=
tion,
+ * it's recommended to provide @dstCapacity >=3D ZSTD_compressBound(srcSiz=
e)
+ * as it eliminates one potential failure scenario,
+ * aka not enough room in dst buffer to write the compressed frame.
+ * Note : ZSTD_compressBound() itself can fail, if @srcSize >=3D ZSTD_MAX_=
INPUT_SIZE .
+ *        In which case, ZSTD_compressBound() will return an error code
+ *        which can be tested using ZSTD_isError().
+ *
+ * ZSTD_COMPRESSBOUND() :
+ * same as ZSTD_compressBound(), but as a macro.
+ * It can be used to produce constants, which can be useful for static all=
ocation,
+ * for example to size a static array on stack.
+ * Will produce constant value 0 if srcSize is too large.
+ */
+#define ZSTD_MAX_INPUT_SIZE ((sizeof(size_t)=3D=3D8) ? 0xFF00FF00FF00FF00U=
LL : 0xFF00FF00U)
+#define ZSTD_COMPRESSBOUND(srcSize)   (((size_t)(srcSize) >=3D ZSTD_MAX_IN=
PUT_SIZE) ? 0 : (srcSize) + ((srcSize)>>8) + (((srcSize) < (128<<10)) ? (((=
128<<10) - (srcSize)) >> 11) /* margin, from 64 to 0 */ : 0))  /* this form=
ula ensures that bound(A) + bound(B) <=3D bound(A+B) as long as A and B >=
=3D 128 KB */
+ZSTDLIB_API size_t ZSTD_compressBound(size_t srcSize); /*!< maximum compre=
ssed size in worst case single-pass scenario */
+
+
+/*=3D=3D=3D=3D=3D=3D  Error helper functions  =3D=3D=3D=3D=3D=3D*/
+/* ZSTD_isError() :
+ * Most ZSTD_* functions returning a size_t value can be tested for error,
+ * using ZSTD_isError().
+ * @return 1 if error, 0 otherwise
+ */
+ZSTDLIB_API unsigned     ZSTD_isError(size_t result);      /*!< tells if a=
 `size_t` function result is an error code */
+ZSTDLIB_API ZSTD_ErrorCode ZSTD_getErrorCode(size_t functionResult); /* co=
nvert a result into an error code, which can be compared to error enum list=
 */
+ZSTDLIB_API const char*  ZSTD_getErrorName(size_t result); /*!< provides r=
eadable string from a function result */
+ZSTDLIB_API int          ZSTD_minCLevel(void);             /*!< minimum ne=
gative compression level allowed, requires v1.4.0+ */
+ZSTDLIB_API int          ZSTD_maxCLevel(void);             /*!< maximum co=
mpression level available */
+ZSTDLIB_API int          ZSTD_defaultCLevel(void);         /*!< default co=
mpression level, specified by ZSTD_CLEVEL_DEFAULT, requires v1.5.0+ */
=20
=20
 /* *************************************
@@ -182,25 +248,25 @@ ZSTDLIB_API int         ZSTD_defaultCLevel(void);    =
       /*!< default compres
 ***************************************/
 /*=3D Compression context
  *  When compressing many times,
- *  it is recommended to allocate a context just once,
- *  and re-use it for each successive compression operation.
- *  This will make workload friendlier for system's memory.
+ *  it is recommended to allocate a compression context just once,
+ *  and reuse it for each successive compression operation.
+ *  This will make the workload easier for system's memory.
  *  Note : re-using context is just a speed / resource optimization.
  *         It doesn't change the compression ratio, which remains identica=
l.
- *  Note 2 : In multi-threaded environments,
- *         use one different context per thread for parallel execution.
+ *  Note 2: For parallel execution in multi-threaded environments,
+ *         use one different context per thread .
  */
 typedef struct ZSTD_CCtx_s ZSTD_CCtx;
 ZSTDLIB_API ZSTD_CCtx* ZSTD_createCCtx(void);
-ZSTDLIB_API size_t     ZSTD_freeCCtx(ZSTD_CCtx* cctx);  /* accept NULL poi=
nter */
+ZSTDLIB_API size_t     ZSTD_freeCCtx(ZSTD_CCtx* cctx);  /* compatible with=
 NULL pointer */
=20
 /*! ZSTD_compressCCtx() :
  *  Same as ZSTD_compress(), using an explicit ZSTD_CCtx.
- *  Important : in order to behave similarly to `ZSTD_compress()`,
- *  this function compresses at requested compression level,
- *  __ignoring any other parameter__ .
+ *  Important : in order to mirror `ZSTD_compress()` behavior,
+ *  this function compresses at the requested compression level,
+ *  __ignoring any other advanced parameter__ .
  *  If any advanced parameter was set using the advanced API,
- *  they will all be reset. Only `compressionLevel` remains.
+ *  they will all be reset. Only @compressionLevel remains.
  */
 ZSTDLIB_API size_t ZSTD_compressCCtx(ZSTD_CCtx* cctx,
                                      void* dst, size_t dstCapacity,
@@ -210,7 +276,7 @@ ZSTDLIB_API size_t ZSTD_compressCCtx(ZSTD_CCtx* cctx,
 /*=3D Decompression context
  *  When decompressing many times,
  *  it is recommended to allocate a context only once,
- *  and re-use it for each successive compression operation.
+ *  and reuse it for each successive compression operation.
  *  This will make workload friendlier for system's memory.
  *  Use one context per thread for parallel execution. */
 typedef struct ZSTD_DCtx_s ZSTD_DCtx;
@@ -220,7 +286,7 @@ ZSTDLIB_API size_t     ZSTD_freeDCtx(ZSTD_DCtx* dctx); =
 /* accept NULL pointer *
 /*! ZSTD_decompressDCtx() :
  *  Same as ZSTD_decompress(),
  *  requires an allocated ZSTD_DCtx.
- *  Compatible with sticky parameters.
+ *  Compatible with sticky parameters (see below).
  */
 ZSTDLIB_API size_t ZSTD_decompressDCtx(ZSTD_DCtx* dctx,
                                        void* dst, size_t dstCapacity,
@@ -236,12 +302,12 @@ ZSTDLIB_API size_t ZSTD_decompressDCtx(ZSTD_DCtx* dct=
x,
  *   using ZSTD_CCtx_set*() functions.
  *   Pushed parameters are sticky : they are valid for next compressed fra=
me, and any subsequent frame.
  *   "sticky" parameters are applicable to `ZSTD_compress2()` and `ZSTD_co=
mpressStream*()` !
- *   __They do not apply to "simple" one-shot variants such as ZSTD_compre=
ssCCtx()__ .
+ *   __They do not apply to one-shot variants such as ZSTD_compressCCtx()_=
_ .
  *
  *   It's possible to reset all parameters to "default" using ZSTD_CCtx_re=
set().
  *
  *   This API supersedes all other "advanced" API entry points in the expe=
rimental section.
- *   In the future, we expect to remove from experimental API entry points=
 which are redundant with this API.
+ *   In the future, we expect to remove API entry points from experimental=
 which are redundant with this API.
  */
=20
=20
@@ -324,6 +390,19 @@ typedef enum {
                               * The higher the value of selected strategy,=
 the more complex it is,
                               * resulting in stronger and slower compressi=
on.
                               * Special: value 0 means "use default strate=
gy". */
+
+    ZSTD_c_targetCBlockSize=3D130, /* v1.5.6+
+                                  * Attempts to fit compressed block size =
into approximately targetCBlockSize.
+                                  * Bound by ZSTD_TARGETCBLOCKSIZE_MIN and=
 ZSTD_TARGETCBLOCKSIZE_MAX.
+                                  * Note that it's not a guarantee, just a=
 convergence target (default:0).
+                                  * No target when targetCBlockSize =3D=3D=
 0.
+                                  * This is helpful in low bandwidth strea=
ming environments to improve end-to-end latency,
+                                  * when a client can make use of partial =
documents (a prominent example being Chrome).
+                                  * Note: this parameter is stable since v=
1.5.6.
+                                  * It was present as an experimental para=
meter in earlier versions,
+                                  * but it's not recommended using it with=
 earlier library versions
+                                  * due to massive performance regressions.
+                                  */
     /* LDM mode parameters */
     ZSTD_c_enableLongDistanceMatching=3D160, /* Enable long distance match=
ing.
                                      * This parameter is designed to impro=
ve compression ratio
@@ -403,15 +482,18 @@ typedef enum {
      * ZSTD_c_forceMaxWindow
      * ZSTD_c_forceAttachDict
      * ZSTD_c_literalCompressionMode
-     * ZSTD_c_targetCBlockSize
      * ZSTD_c_srcSizeHint
      * ZSTD_c_enableDedicatedDictSearch
      * ZSTD_c_stableInBuffer
      * ZSTD_c_stableOutBuffer
      * ZSTD_c_blockDelimiters
      * ZSTD_c_validateSequences
-     * ZSTD_c_useBlockSplitter
+     * ZSTD_c_blockSplitterLevel
+     * ZSTD_c_splitAfterSequences
      * ZSTD_c_useRowMatchFinder
+     * ZSTD_c_prefetchCDictTables
+     * ZSTD_c_enableSeqProducerFallback
+     * ZSTD_c_maxBlockSize
      * Because they are not stable, it's necessary to define ZSTD_STATIC_L=
INKING_ONLY to access them.
      * note : never ever use experimentalParam? names directly;
      *        also, the enums values themselves are unstable and can still=
 change.
@@ -421,7 +503,7 @@ typedef enum {
      ZSTD_c_experimentalParam3=3D1000,
      ZSTD_c_experimentalParam4=3D1001,
      ZSTD_c_experimentalParam5=3D1002,
-     ZSTD_c_experimentalParam6=3D1003,
+     /* was ZSTD_c_experimentalParam6=3D1003; is now ZSTD_c_targetCBlockSi=
ze */
      ZSTD_c_experimentalParam7=3D1004,
      ZSTD_c_experimentalParam8=3D1005,
      ZSTD_c_experimentalParam9=3D1006,
@@ -430,7 +512,12 @@ typedef enum {
      ZSTD_c_experimentalParam12=3D1009,
      ZSTD_c_experimentalParam13=3D1010,
      ZSTD_c_experimentalParam14=3D1011,
-     ZSTD_c_experimentalParam15=3D1012
+     ZSTD_c_experimentalParam15=3D1012,
+     ZSTD_c_experimentalParam16=3D1013,
+     ZSTD_c_experimentalParam17=3D1014,
+     ZSTD_c_experimentalParam18=3D1015,
+     ZSTD_c_experimentalParam19=3D1016,
+     ZSTD_c_experimentalParam20=3D1017
 } ZSTD_cParameter;
=20
 typedef struct {
@@ -493,7 +580,7 @@ typedef enum {
  *                  They will be used to compress next frame.
  *                  Resetting session never fails.
  *  - The parameters : changes all parameters back to "default".
- *                  This removes any reference to any dictionary too.
+ *                  This also removes any reference to any dictionary or e=
xternal sequence producer.
  *                  Parameters can only be changed between 2 sessions (i.e=
. no compression is currently ongoing)
  *                  otherwise the reset fails, and function returns an err=
or value (which can be tested using ZSTD_isError())
  *  - Both : similar to resetting the session, followed by resetting param=
eters.
@@ -502,11 +589,13 @@ ZSTDLIB_API size_t ZSTD_CCtx_reset(ZSTD_CCtx* cctx, Z=
STD_ResetDirective reset);
=20
 /*! ZSTD_compress2() :
  *  Behave the same as ZSTD_compressCCtx(), but compression parameters are=
 set using the advanced API.
+ *  (note that this entry point doesn't even expose a compression level pa=
rameter).
  *  ZSTD_compress2() always starts a new frame.
  *  Should cctx hold data from a previously unfinished frame, everything a=
bout it is forgotten.
  *  - Compression parameters are pushed into CCtx before starting compress=
ion, using ZSTD_CCtx_set*()
  *  - The function is always blocking, returns when compression is complet=
ed.
- *  Hint : compression runs faster if `dstCapacity` >=3D  `ZSTD_compressBo=
und(srcSize)`.
+ *  NOTE: Providing `dstCapacity >=3D ZSTD_compressBound(srcSize)` guarant=
ees that zstd will have
+ *        enough space to successfully compress the data, though it is pos=
sible it fails for other reasons.
  * @return : compressed size written into `dst` (<=3D `dstCapacity),
  *           or an error code if it fails (which can be tested using ZSTD_=
isError()).
  */
@@ -543,13 +632,17 @@ typedef enum {
      * ZSTD_d_stableOutBuffer
      * ZSTD_d_forceIgnoreChecksum
      * ZSTD_d_refMultipleDDicts
+     * ZSTD_d_disableHuffmanAssembly
+     * ZSTD_d_maxBlockSize
      * Because they are not stable, it's necessary to define ZSTD_STATIC_L=
INKING_ONLY to access them.
      * note : never ever use experimentalParam? names directly
      */
      ZSTD_d_experimentalParam1=3D1000,
      ZSTD_d_experimentalParam2=3D1001,
      ZSTD_d_experimentalParam3=3D1002,
-     ZSTD_d_experimentalParam4=3D1003
+     ZSTD_d_experimentalParam4=3D1003,
+     ZSTD_d_experimentalParam5=3D1004,
+     ZSTD_d_experimentalParam6=3D1005
=20
 } ZSTD_dParameter;
=20
@@ -604,14 +697,14 @@ typedef struct ZSTD_outBuffer_s {
 *  A ZSTD_CStream object is required to track streaming operation.
 *  Use ZSTD_createCStream() and ZSTD_freeCStream() to create/release resou=
rces.
 *  ZSTD_CStream objects can be reused multiple times on consecutive compre=
ssion operations.
-*  It is recommended to re-use ZSTD_CStream since it will play nicer with =
system's memory, by re-using already allocated memory.
+*  It is recommended to reuse ZSTD_CStream since it will play nicer with s=
ystem's memory, by re-using already allocated memory.
 *
 *  For parallel execution, use one separate ZSTD_CStream per thread.
 *
 *  note : since v1.3.0, ZSTD_CStream and ZSTD_CCtx are the same thing.
 *
 *  Parameters are sticky : when starting a new compression on the same con=
text,
-*  it will re-use the same sticky parameters as previous compression sessi=
on.
+*  it will reuse the same sticky parameters as previous compression sessio=
n.
 *  When in doubt, it's recommended to fully initialize the context before =
usage.
 *  Use ZSTD_CCtx_reset() to reset the context and ZSTD_CCtx_setParameter(),
 *  ZSTD_CCtx_setPledgedSrcSize(), or ZSTD_CCtx_loadDictionary() and friend=
s to
@@ -700,6 +793,11 @@ typedef enum {
  *            only ZSTD_e_end or ZSTD_e_flush operations are allowed.
  *            Before starting a new compression job, or changing compressi=
on parameters,
  *            it is required to fully flush internal buffers.
+ *  - note: if an operation ends with an error, it may leave @cctx in an u=
ndefined state.
+ *          Therefore, it's UB to invoke ZSTD_compressStream2() of ZSTD_co=
mpressStream() on such a state.
+ *          In order to be re-employed after an error, a state must be res=
et,
+ *          which can be done explicitly (ZSTD_CCtx_reset()),
+ *          or is sometimes implied by methods starting a new compression =
job (ZSTD_initCStream(), ZSTD_compressCCtx())
  */
 ZSTDLIB_API size_t ZSTD_compressStream2( ZSTD_CCtx* cctx,
                                          ZSTD_outBuffer* output,
@@ -728,8 +826,6 @@ ZSTDLIB_API size_t ZSTD_CStreamOutSize(void);   /*< rec=
ommended size for output
  * This following is a legacy streaming API, available since v1.0+ .
  * It can be replaced by ZSTD_CCtx_reset() and ZSTD_compressStream2().
  * It is redundant, but remains fully supported.
- * Streaming in combination with advanced parameters and dictionary compre=
ssion
- * can only be used through the new API.
  *************************************************************************=
*****/
=20
 /*!
@@ -738,6 +834,9 @@ ZSTDLIB_API size_t ZSTD_CStreamOutSize(void);   /*< rec=
ommended size for output
  *     ZSTD_CCtx_reset(zcs, ZSTD_reset_session_only);
  *     ZSTD_CCtx_refCDict(zcs, NULL); // clear the dictionary (if any)
  *     ZSTD_CCtx_setParameter(zcs, ZSTD_c_compressionLevel, compressionLev=
el);
+ *
+ * Note that ZSTD_initCStream() clears any previously set dictionary. Use =
the new API
+ * to compress with a dictionary.
  */
 ZSTDLIB_API size_t ZSTD_initCStream(ZSTD_CStream* zcs, int compressionLeve=
l);
 /*!
@@ -758,7 +857,7 @@ ZSTDLIB_API size_t ZSTD_endStream(ZSTD_CStream* zcs, ZS=
TD_outBuffer* output);
 *
 *  A ZSTD_DStream object is required to track streaming operations.
 *  Use ZSTD_createDStream() and ZSTD_freeDStream() to create/release resou=
rces.
-*  ZSTD_DStream objects can be re-used multiple times.
+*  ZSTD_DStream objects can be re-employed multiple times.
 *
 *  Use ZSTD_initDStream() to start a new decompression operation.
 * @return : recommended first input size
@@ -768,16 +867,21 @@ ZSTDLIB_API size_t ZSTD_endStream(ZSTD_CStream* zcs, =
ZSTD_outBuffer* output);
 *  The function will update both `pos` fields.
 *  If `input.pos < input.size`, some input has not been consumed.
 *  It's up to the caller to present again remaining data.
+*
 *  The function tries to flush all data decoded immediately, respecting ou=
tput buffer size.
 *  If `output.pos < output.size`, decoder has flushed everything it could.
-*  But if `output.pos =3D=3D output.size`, there might be some data left w=
ithin internal buffers.,
+*
+*  However, when `output.pos =3D=3D output.size`, it's more difficult to k=
now.
+*  If @return > 0, the frame is not complete, meaning
+*  either there is still some data left to flush within internal buffers,
+*  or there is more input to read to complete the frame (or both).
 *  In which case, call ZSTD_decompressStream() again to flush whatever rem=
ains in the buffer.
 *  Note : with no additional input provided, amount of data flushed is nec=
essarily <=3D ZSTD_BLOCKSIZE_MAX.
 * @return : 0 when a frame is completely decoded and fully flushed,
 *        or an error code, which can be tested using ZSTD_isError(),
 *        or any other value > 0, which means there is still some decoding =
or flushing to do to complete current frame :
 *                                the return value is a suggested next inpu=
t size (just a hint for better latency)
-*                                that will never request more than the rem=
aining frame size.
+*                                that will never request more than the rem=
aining content of the compressed frame.
 * ************************************************************************=
*******/
=20
 typedef ZSTD_DCtx ZSTD_DStream;  /*< DCtx and DStream are now effectively =
same object (>=3D v1.3.0) */
@@ -788,13 +892,38 @@ ZSTDLIB_API size_t ZSTD_freeDStream(ZSTD_DStream* zds=
);  /* accept NULL pointer
=20
 /*=3D=3D=3D=3D=3D Streaming decompression functions =3D=3D=3D=3D=3D*/
=20
-/* This function is redundant with the advanced API and equivalent to:
+/*! ZSTD_initDStream() :
+ * Initialize/reset DStream state for new decompression operation.
+ * Call before new decompression operation using same DStream.
  *
+ * Note : This function is redundant with the advanced API and equivalent =
to:
  *     ZSTD_DCtx_reset(zds, ZSTD_reset_session_only);
  *     ZSTD_DCtx_refDDict(zds, NULL);
  */
 ZSTDLIB_API size_t ZSTD_initDStream(ZSTD_DStream* zds);
=20
+/*! ZSTD_decompressStream() :
+ * Streaming decompression function.
+ * Call repetitively to consume full input updating it as necessary.
+ * Function will update both input and output `pos` fields exposing curren=
t state via these fields:
+ * - `input.pos < input.size`, some input remaining and caller should prov=
ide remaining input
+ *   on the next call.
+ * - `output.pos < output.size`, decoder flushed internal output buffer.
+ * - `output.pos =3D=3D output.size`, unflushed data potentially present i=
n the internal buffers,
+ *   check ZSTD_decompressStream() @return value,
+ *   if > 0, invoke it again to flush remaining data to output.
+ * Note : with no additional input, amount of data flushed <=3D ZSTD_BLOCK=
SIZE_MAX.
+ *
+ * @return : 0 when a frame is completely decoded and fully flushed,
+ *           or an error code, which can be tested using ZSTD_isError(),
+ *           or any other value > 0, which means there is some decoding or=
 flushing to do to complete current frame.
+ *
+ * Note: when an operation returns with an error code, the @zds state may =
be left in undefined state.
+ *       It's UB to invoke `ZSTD_decompressStream()` on such a state.
+ *       In order to re-use such a state, it must be first reset,
+ *       which can be done explicitly (`ZSTD_DCtx_reset()`),
+ *       or is implied for operations starting some new decompression job =
(`ZSTD_initDStream`, `ZSTD_decompressDCtx()`, `ZSTD_decompress_usingDict()`)
+ */
 ZSTDLIB_API size_t ZSTD_decompressStream(ZSTD_DStream* zds, ZSTD_outBuffer=
* output, ZSTD_inBuffer* input);
=20
 ZSTDLIB_API size_t ZSTD_DStreamInSize(void);    /*!< recommended size for =
input buffer */
@@ -913,7 +1042,7 @@ ZSTDLIB_API unsigned ZSTD_getDictID_fromDDict(const ZS=
TD_DDict* ddict);
  *  If @return =3D=3D 0, the dictID could not be decoded.
  *  This could for one of the following reasons :
  *  - The frame does not require a dictionary to be decoded (most common c=
ase).
- *  - The frame was built with dictID intentionally removed. Whatever dict=
ionary is necessary is a hidden information.
+ *  - The frame was built with dictID intentionally removed. Whatever dict=
ionary is necessary is a hidden piece of information.
  *    Note : this use case also happens when using a non-conformant dictio=
nary.
  *  - `srcSize` is too small, and as a result, the frame header could not =
be decoded (only possible if `srcSize < ZSTD_FRAMEHEADERSIZE_MAX`).
  *  - This is not a Zstandard frame.
@@ -925,9 +1054,11 @@ ZSTDLIB_API unsigned ZSTD_getDictID_fromFrame(const v=
oid* src, size_t srcSize);
  * Advanced dictionary and prefix API (Requires v1.4.0+)
  *
  * This API allows dictionaries to be used with ZSTD_compress2(),
- * ZSTD_compressStream2(), and ZSTD_decompressDCtx(). Dictionaries are sti=
cky, and
- * only reset with the context is reset with ZSTD_reset_parameters or
- * ZSTD_reset_session_and_parameters. Prefixes are single-use.
+ * ZSTD_compressStream2(), and ZSTD_decompressDCtx().
+ * Dictionaries are sticky, they remain valid when same context is reused,
+ * they only reset when the context is reset
+ * with ZSTD_reset_parameters or ZSTD_reset_session_and_parameters.
+ * In contrast, Prefixes are single-use.
  *************************************************************************=
*****/
=20
=20
@@ -937,8 +1068,9 @@ ZSTDLIB_API unsigned ZSTD_getDictID_fromFrame(const vo=
id* src, size_t srcSize);
  * @result : 0, or an error code (which can be tested with ZSTD_isError()).
  *  Special: Loading a NULL (or 0-size) dictionary invalidates previous di=
ctionary,
  *           meaning "return to no-dictionary mode".
- *  Note 1 : Dictionary is sticky, it will be used for all future compress=
ed frames.
- *           To return to "no-dictionary" situation, load a NULL dictionar=
y (or reset parameters).
+ *  Note 1 : Dictionary is sticky, it will be used for all future compress=
ed frames,
+ *           until parameters are reset, a new dictionary is loaded, or th=
e dictionary
+ *           is explicitly invalidated by loading a NULL dictionary.
  *  Note 2 : Loading a dictionary involves building tables.
  *           It's also a CPU consuming operation, with non-negligible impa=
ct on latency.
  *           Tables are dependent on compression parameters, and for this =
reason,
@@ -947,11 +1079,15 @@ ZSTDLIB_API unsigned ZSTD_getDictID_fromFrame(const =
void* src, size_t srcSize);
  *           Use experimental ZSTD_CCtx_loadDictionary_byReference() to re=
ference content instead.
  *           In such a case, dictionary buffer must outlive its users.
  *  Note 4 : Use ZSTD_CCtx_loadDictionary_advanced()
- *           to precisely select how dictionary content must be interprete=
d. */
+ *           to precisely select how dictionary content must be interprete=
d.
+ *  Note 5 : This method does not benefit from LDM (long distance mode).
+ *           If you want to employ LDM on some large dictionary content,
+ *           prefer employing ZSTD_CCtx_refPrefix() described below.
+ */
 ZSTDLIB_API size_t ZSTD_CCtx_loadDictionary(ZSTD_CCtx* cctx, const void* d=
ict, size_t dictSize);
=20
 /*! ZSTD_CCtx_refCDict() : Requires v1.4.0+
- *  Reference a prepared dictionary, to be used for all next compressed fr=
ames.
+ *  Reference a prepared dictionary, to be used for all future compressed =
frames.
  *  Note that compression parameters are enforced from within CDict,
  *  and supersede any compression parameter previously set within CCtx.
  *  The parameters ignored are labelled as "superseded-by-cdict" in the ZS=
TD_cParameter enum docs.
@@ -970,6 +1106,7 @@ ZSTDLIB_API size_t ZSTD_CCtx_refCDict(ZSTD_CCtx* cctx,=
 const ZSTD_CDict* cdict);
  *  Decompression will need same prefix to properly regenerate data.
  *  Compressing with a prefix is similar in outcome as performing a diff a=
nd compressing it,
  *  but performs much faster, especially during decompression (compression=
 speed is tunable with compression level).
+ *  This method is compatible with LDM (long distance mode).
  * @result : 0, or an error code (which can be tested with ZSTD_isError()).
  *  Special: Adding any prefix (including NULL) invalidates any previous p=
refix or dictionary
  *  Note 1 : Prefix buffer is referenced. It **must** outlive compression.
@@ -986,9 +1123,9 @@ ZSTDLIB_API size_t ZSTD_CCtx_refPrefix(ZSTD_CCtx* cctx,
                                  const void* prefix, size_t prefixSize);
=20
 /*! ZSTD_DCtx_loadDictionary() : Requires v1.4.0+
- *  Create an internal DDict from dict buffer,
- *  to be used to decompress next frames.
- *  The dictionary remains valid for all future frames, until explicitly i=
nvalidated.
+ *  Create an internal DDict from dict buffer, to be used to decompress al=
l future frames.
+ *  The dictionary remains valid for all future frames, until explicitly i=
nvalidated, or
+ *  a new dictionary is loaded.
  * @result : 0, or an error code (which can be tested with ZSTD_isError()).
  *  Special : Adding a NULL (or 0-size) dictionary invalidates any previou=
s dictionary,
  *            meaning "return to no-dictionary mode".
@@ -1012,9 +1149,10 @@ ZSTDLIB_API size_t ZSTD_DCtx_loadDictionary(ZSTD_DCt=
x* dctx, const void* dict, s
  *  The memory for the table is allocated on the first call to refDDict, a=
nd can be
  *  freed with ZSTD_freeDCtx().
  *
+ *  If called with ZSTD_d_refMultipleDDicts disabled (the default), only o=
ne dictionary
+ *  will be managed, and referencing a dictionary effectively "discards" a=
ny previous one.
+ *
  * @result : 0, or an error code (which can be tested with ZSTD_isError()).
- *  Note 1 : Currently, only one dictionary can be managed.
- *           Referencing a new dictionary effectively "discards" any previ=
ous one.
  *  Special: referencing a NULL DDict means "return to no-dictionary mode".
  *  Note 2 : DDict is just referenced, its lifetime must outlive its usage=
 from DCtx.
  */
@@ -1051,6 +1189,7 @@ ZSTDLIB_API size_t ZSTD_sizeof_DStream(const ZSTD_DSt=
ream* zds);
 ZSTDLIB_API size_t ZSTD_sizeof_CDict(const ZSTD_CDict* cdict);
 ZSTDLIB_API size_t ZSTD_sizeof_DDict(const ZSTD_DDict* ddict);
=20
+
 #endif  /* ZSTD_H_235446 */
=20
=20
@@ -1066,29 +1205,12 @@ ZSTDLIB_API size_t ZSTD_sizeof_DDict(const ZSTD_DDi=
ct* ddict);
 #if !defined(ZSTD_H_ZSTD_STATIC_LINKING_ONLY)
 #define ZSTD_H_ZSTD_STATIC_LINKING_ONLY
=20
+
 /* This can be overridden externally to hide static symbols. */
 #ifndef ZSTDLIB_STATIC_API
 #define ZSTDLIB_STATIC_API ZSTDLIB_VISIBLE
 #endif
=20
-/* Deprecation warnings :
- * Should these warnings be a problem, it is generally possible to disable=
 them,
- * typically with -Wno-deprecated-declarations for gcc or _CRT_SECURE_NO_W=
ARNINGS in Visual.
- * Otherwise, it's also possible to define ZSTD_DISABLE_DEPRECATE_WARNINGS.
- */
-#ifdef ZSTD_DISABLE_DEPRECATE_WARNINGS
-#  define ZSTD_DEPRECATED(message) ZSTDLIB_STATIC_API  /* disable deprecat=
ion warnings */
-#else
-#  if (defined(GNUC) && (GNUC > 4 || (GNUC =3D=3D 4 && GNUC_MINOR >=3D 5))=
) || defined(__clang__)
-#    define ZSTD_DEPRECATED(message) ZSTDLIB_STATIC_API __attribute__((dep=
recated(message)))
-#  elif (__GNUC__ >=3D 3)
-#    define ZSTD_DEPRECATED(message) ZSTDLIB_STATIC_API __attribute__((dep=
recated))
-#  else
-#    pragma message("WARNING: You need to implement ZSTD_DEPRECATED for th=
is compiler")
-#    define ZSTD_DEPRECATED(message) ZSTDLIB_STATIC_API
-#  endif
-#endif /* ZSTD_DISABLE_DEPRECATE_WARNINGS */
-
 /* ***********************************************************************=
***************
  *   experimental API (static linking only)
  *************************************************************************=
***************
@@ -1123,6 +1245,7 @@ ZSTDLIB_API size_t ZSTD_sizeof_DDict(const ZSTD_DDict=
* ddict);
 #define ZSTD_TARGETLENGTH_MIN     0   /* note : comparing this constant to=
 an unsigned results in a tautological test */
 #define ZSTD_STRATEGY_MIN        ZSTD_fast
 #define ZSTD_STRATEGY_MAX        ZSTD_btultra2
+#define ZSTD_BLOCKSIZE_MAX_MIN (1 << 10) /* The minimum valid max blocksiz=
e. Maximum blocksizes smaller than this make compressBound() inaccurate. */
=20
=20
 #define ZSTD_OVERLAPLOG_MIN       0
@@ -1146,7 +1269,7 @@ ZSTDLIB_API size_t ZSTD_sizeof_DDict(const ZSTD_DDict=
* ddict);
 #define ZSTD_LDM_HASHRATELOG_MAX (ZSTD_WINDOWLOG_MAX - ZSTD_HASHLOG_MIN)
=20
 /* Advanced parameter bounds */
-#define ZSTD_TARGETCBLOCKSIZE_MIN   64
+#define ZSTD_TARGETCBLOCKSIZE_MIN   1340 /* suitable to fit into an ethern=
et / wifi / 4G transport frame */
 #define ZSTD_TARGETCBLOCKSIZE_MAX   ZSTD_BLOCKSIZE_MAX
 #define ZSTD_SRCSIZEHINT_MIN        0
 #define ZSTD_SRCSIZEHINT_MAX        INT_MAX
@@ -1188,7 +1311,7 @@ typedef struct {
                                *
                                * Note: This field is optional. ZSTD_genera=
teSequences() will calculate the value of
                                * 'rep', but repeat offsets do not necessar=
ily need to be calculated from an external
-                               * sequence provider's perspective. For exam=
ple, ZSTD_compressSequences() does not
+                               * sequence provider perspective. For exampl=
e, ZSTD_compressSequences() does not
                                * use this 'rep' field at all (as of now).
                                */
 } ZSTD_Sequence;
@@ -1293,17 +1416,18 @@ typedef enum {
 } ZSTD_literalCompressionMode_e;
=20
 typedef enum {
-  /* Note: This enum controls features which are conditionally beneficial.=
 Zstd typically will make a final
-   * decision on whether or not to enable the feature (ZSTD_ps_auto), but =
setting the switch to ZSTD_ps_enable
-   * or ZSTD_ps_disable allow for a force enable/disable the feature.
+  /* Note: This enum controls features which are conditionally beneficial.
+   * Zstd can take a decision on whether or not to enable the feature (ZST=
D_ps_auto),
+   * but setting the switch to ZSTD_ps_enable or ZSTD_ps_disable force ena=
ble/disable the feature.
    */
   ZSTD_ps_auto =3D 0,         /* Let the library automatically determine w=
hether the feature shall be enabled */
   ZSTD_ps_enable =3D 1,       /* Force-enable the feature */
   ZSTD_ps_disable =3D 2       /* Do not use the feature */
-} ZSTD_paramSwitch_e;
+} ZSTD_ParamSwitch_e;
+#define ZSTD_paramSwitch_e ZSTD_ParamSwitch_e  /* old name */
=20
 /* *************************************
-*  Frame size functions
+*  Frame header and size functions
 ***************************************/
=20
 /*! ZSTD_findDecompressedSize() :
@@ -1345,34 +1469,130 @@ ZSTDLIB_STATIC_API unsigned long long ZSTD_findDec=
ompressedSize(const void* src,
 ZSTDLIB_STATIC_API unsigned long long ZSTD_decompressBound(const void* src=
, size_t srcSize);
=20
 /*! ZSTD_frameHeaderSize() :
- *  srcSize must be >=3D ZSTD_FRAMEHEADERSIZE_PREFIX.
+ *  srcSize must be large enough, aka >=3D ZSTD_FRAMEHEADERSIZE_PREFIX.
  * @return : size of the Frame Header,
  *           or an error code (if srcSize is too small) */
 ZSTDLIB_STATIC_API size_t ZSTD_frameHeaderSize(const void* src, size_t src=
Size);
=20
+typedef enum { ZSTD_frame, ZSTD_skippableFrame } ZSTD_FrameType_e;
+#define ZSTD_frameType_e ZSTD_FrameType_e /* old name */
+typedef struct {
+    unsigned long long frameContentSize; /* if =3D=3D ZSTD_CONTENTSIZE_UNK=
NOWN, it means this field is not available. 0 means "empty" */
+    unsigned long long windowSize;       /* can be very large, up to <=3D =
frameContentSize */
+    unsigned blockSizeMax;
+    ZSTD_FrameType_e frameType;          /* if =3D=3D ZSTD_skippableFrame,=
 frameContentSize is the size of skippable content */
+    unsigned headerSize;
+    unsigned dictID;                     /* for ZSTD_skippableFrame, conta=
ins the skippable magic variant [0-15] */
+    unsigned checksumFlag;
+    unsigned _reserved1;
+    unsigned _reserved2;
+} ZSTD_FrameHeader;
+#define ZSTD_frameHeader ZSTD_FrameHeader /* old name */
+
+/*! ZSTD_getFrameHeader() :
+ *  decode Frame Header into `zfhPtr`, or requires larger `srcSize`.
+ * @return : 0 =3D> header is complete, `zfhPtr` is correctly filled,
+ *          >0 =3D> `srcSize` is too small, @return value is the wanted `s=
rcSize` amount, `zfhPtr` is not filled,
+ *           or an error code, which can be tested using ZSTD_isError() */
+ZSTDLIB_STATIC_API size_t ZSTD_getFrameHeader(ZSTD_FrameHeader* zfhPtr, co=
nst void* src, size_t srcSize);
+/*! ZSTD_getFrameHeader_advanced() :
+ *  same as ZSTD_getFrameHeader(),
+ *  with added capability to select a format (like ZSTD_f_zstd1_magicless)=
 */
+ZSTDLIB_STATIC_API size_t ZSTD_getFrameHeader_advanced(ZSTD_FrameHeader* z=
fhPtr, const void* src, size_t srcSize, ZSTD_format_e format);
+
+/*! ZSTD_decompressionMargin() :
+ * Zstd supports in-place decompression, where the input and output buffer=
s overlap.
+ * In this case, the output buffer must be at least (Margin + Output_Size)=
 bytes large,
+ * and the input buffer must be at the end of the output buffer.
+ *
+ *  _______________________ Output Buffer ________________________
+ * |                                                              |
+ * |                                        ____ Input Buffer ____|
+ * |                                       |                      |
+ * v                                       v                      v
+ * |---------------------------------------|-----------|----------|
+ * ^                                                   ^          ^
+ * |___________________ Output_Size ___________________|_ Margin _|
+ *
+ * NOTE: See also ZSTD_DECOMPRESSION_MARGIN().
+ * NOTE: This applies only to single-pass decompression through ZSTD_decom=
press() or
+ * ZSTD_decompressDCtx().
+ * NOTE: This function supports multi-frame input.
+ *
+ * @param src The compressed frame(s)
+ * @param srcSize The size of the compressed frame(s)
+ * @returns The decompression margin or an error that can be checked with =
ZSTD_isError().
+ */
+ZSTDLIB_STATIC_API size_t ZSTD_decompressionMargin(const void* src, size_t=
 srcSize);
+
+/*! ZSTD_DECOMPRESS_MARGIN() :
+ * Similar to ZSTD_decompressionMargin(), but instead of computing the mar=
gin from
+ * the compressed frame, compute it from the original size and the blockSi=
zeLog.
+ * See ZSTD_decompressionMargin() for details.
+ *
+ * WARNING: This macro does not support multi-frame input, the input must =
be a single
+ * zstd frame. If you need that support use the function, or implement it =
yourself.
+ *
+ * @param originalSize The original uncompressed size of the data.
+ * @param blockSize    The block size =3D=3D MIN(windowSize, ZSTD_BLOCKSIZ=
E_MAX).
+ *                     Unless you explicitly set the windowLog smaller than
+ *                     ZSTD_BLOCKSIZELOG_MAX you can just use ZSTD_BLOCKSI=
ZE_MAX.
+ */
+#define ZSTD_DECOMPRESSION_MARGIN(originalSize, blockSize) ((size_t)(     =
                                         \
+        ZSTD_FRAMEHEADERSIZE_MAX                                          =
                    /* Frame header */ + \
+        4                                                                 =
                        /* checksum */ + \
+        ((originalSize) =3D=3D 0 ? 0 : 3 * (((originalSize) + (blockSize) =
- 1) / blockSize)) /* 3 bytes per block */ + \
+        (blockSize)                                                       =
             /* One block of margin */   \
+    ))
+
 typedef enum {
-  ZSTD_sf_noBlockDelimiters =3D 0,         /* Representation of ZSTD_Seque=
nce has no block delimiters, sequences only */
-  ZSTD_sf_explicitBlockDelimiters =3D 1    /* Representation of ZSTD_Seque=
nce contains explicit block delimiters */
-} ZSTD_sequenceFormat_e;
+  ZSTD_sf_noBlockDelimiters =3D 0,         /* ZSTD_Sequence[] has no block=
 delimiters, just sequences */
+  ZSTD_sf_explicitBlockDelimiters =3D 1    /* ZSTD_Sequence[] contains exp=
licit block delimiters */
+} ZSTD_SequenceFormat_e;
+#define ZSTD_sequenceFormat_e ZSTD_SequenceFormat_e /* old name */
+
+/*! ZSTD_sequenceBound() :
+ * `srcSize` : size of the input buffer
+ *  @return : upper-bound for the number of sequences that can be generated
+ *            from a buffer of srcSize bytes
+ *
+ *  note : returns number of sequences - to get bytes, multiply by sizeof(=
ZSTD_Sequence).
+ */
+ZSTDLIB_STATIC_API size_t ZSTD_sequenceBound(size_t srcSize);
=20
 /*! ZSTD_generateSequences() :
- * Generate sequences using ZSTD_compress2, given a source buffer.
+ * WARNING: This function is meant for debugging and informational purpose=
s ONLY!
+ * Its implementation is flawed, and it will be deleted in a future versio=
n.
+ * It is not guaranteed to succeed, as there are several cases where it wi=
ll give
+ * up and fail. You should NOT use this function in production code.
+ *
+ * This function is deprecated, and will be removed in a future version.
+ *
+ * Generate sequences using ZSTD_compress2(), given a source buffer.
+ *
+ * @param zc The compression context to be used for ZSTD_compress2(). Set =
any
+ *           compression parameters you need on this context.
+ * @param outSeqs The output sequences buffer of size @p outSeqsSize
+ * @param outSeqsCapacity The size of the output sequences buffer.
+ *                    ZSTD_sequenceBound(srcSize) is an upper bound on the=
 number
+ *                    of sequences that can be generated.
+ * @param src The source buffer to generate sequences from of size @p srcS=
ize.
+ * @param srcSize The size of the source buffer.
  *
  * Each block will end with a dummy sequence
  * with offset =3D=3D 0, matchLength =3D=3D 0, and litLength =3D=3D length=
 of last literals.
  * litLength may be =3D=3D 0, and if so, then the sequence of (of: 0 ml: 0=
 ll: 0)
  * simply acts as a block delimiter.
  *
- * zc can be used to insert custom compression params.
- * This function invokes ZSTD_compress2
- *
- * The output of this function can be fed into ZSTD_compressSequences() wi=
th CCtx
- * setting of ZSTD_c_blockDelimiters as ZSTD_sf_explicitBlockDelimiters
- * @return : number of sequences generated
+ * @returns The number of sequences generated, necessarily less than
+ *          ZSTD_sequenceBound(srcSize), or an error code that can be chec=
ked
+ *          with ZSTD_isError().
  */
-
-ZSTDLIB_STATIC_API size_t ZSTD_generateSequences(ZSTD_CCtx* zc, ZSTD_Seque=
nce* outSeqs,
-                                          size_t outSeqsSize, const void* =
src, size_t srcSize);
+ZSTD_DEPRECATED("For debugging only, will be replaced by ZSTD_extractSeque=
nces()")
+ZSTDLIB_STATIC_API size_t
+ZSTD_generateSequences(ZSTD_CCtx* zc,
+                       ZSTD_Sequence* outSeqs, size_t outSeqsCapacity,
+                       const void* src, size_t srcSize);
=20
 /*! ZSTD_mergeBlockDelimiters() :
  * Given an array of ZSTD_Sequence, remove all sequences that represent bl=
ock delimiters/last literals
@@ -1388,8 +1608,10 @@ ZSTDLIB_STATIC_API size_t ZSTD_generateSequences(ZST=
D_CCtx* zc, ZSTD_Sequence* o
 ZSTDLIB_STATIC_API size_t ZSTD_mergeBlockDelimiters(ZSTD_Sequence* sequenc=
es, size_t seqsSize);
=20
 /*! ZSTD_compressSequences() :
- * Compress an array of ZSTD_Sequence, generated from the original source =
buffer, into dst.
- * If a dictionary is included, then the cctx should reference the dict. (=
see: ZSTD_CCtx_refCDict(), ZSTD_CCtx_loadDictionary(), etc.)
+ * Compress an array of ZSTD_Sequence, associated with @src buffer, into d=
st.
+ * @src contains the entire input (not just the literals).
+ * If @srcSize > sum(sequence.length), the remaining bytes are considered =
all literals
+ * If a dictionary is included, then the cctx should reference the dict (s=
ee: ZSTD_CCtx_refCDict(), ZSTD_CCtx_loadDictionary(), etc.).
  * The entire source is compressed into a single frame.
  *
  * The compression behavior changes based on cctx params. In particular:
@@ -1398,11 +1620,17 @@ ZSTDLIB_STATIC_API size_t ZSTD_mergeBlockDelimiters=
(ZSTD_Sequence* sequences, si
  *    the block size derived from the cctx, and sequences may be split. Th=
is is the default setting.
  *
  *    If ZSTD_c_blockDelimiters =3D=3D ZSTD_sf_explicitBlockDelimiters, th=
e array of ZSTD_Sequence is expected to contain
- *    block delimiters (defined in ZSTD_Sequence). Behavior is undefined i=
f no block delimiters are provided.
+ *    valid block delimiters (defined in ZSTD_Sequence). Behavior is undef=
ined if no block delimiters are provided.
+ *
+ *    When ZSTD_c_blockDelimiters =3D=3D ZSTD_sf_explicitBlockDelimiters, =
it's possible to decide generating repcodes
+ *    using the advanced parameter ZSTD_c_repcodeResolution. Repcodes will=
 improve compression ratio, though the benefit
+ *    can vary greatly depending on Sequences. On the other hand, repcode =
resolution is an expensive operation.
+ *    By default, it's disabled at low (<10) compression levels, and enabl=
ed above the threshold (>=3D10).
+ *    ZSTD_c_repcodeResolution makes it possible to directly manage this p=
rocessing in either direction.
  *
- *    If ZSTD_c_validateSequences =3D=3D 0, this function will blindly acc=
ept the sequences provided. Invalid sequences cause undefined
- *    behavior. If ZSTD_c_validateSequences =3D=3D 1, then if sequence is =
invalid (see doc/zstd_compression_format.md for
- *    specifics regarding offset/matchlength requirements) then the functi=
on will bail out and return an error.
+ *    If ZSTD_c_validateSequences =3D=3D 0, this function blindly accepts =
the Sequences provided. Invalid Sequences cause undefined
+ *    behavior. If ZSTD_c_validateSequences =3D=3D 1, then the function wi=
ll detect invalid Sequences (see doc/zstd_compression_format.md for
+ *    specifics regarding offset/matchlength requirements) and then bail o=
ut and return an error.
  *
  *    In addition to the two adjustable experimental params, there are oth=
er important cctx params.
  *    - ZSTD_c_minMatch MUST be set as less than or equal to the smallest =
match generated by the match finder. It has a minimum value of ZSTD_MINMATC=
H_MIN.
@@ -1410,14 +1638,42 @@ ZSTDLIB_STATIC_API size_t ZSTD_mergeBlockDelimiters=
(ZSTD_Sequence* sequences, si
  *    - ZSTD_c_windowLog affects offset validation: this function will ret=
urn an error at higher debug levels if a provided offset
  *      is larger than what the spec allows for a given window log and dic=
tionary (if present). See: doc/zstd_compression_format.md
  *
- * Note: Repcodes are, as of now, always re-calculated within this functio=
n, so ZSTD_Sequence::rep is unused.
- * Note 2: Once we integrate ability to ingest repcodes, the explicit bloc=
k delims mode must respect those repcodes exactly,
- *         and cannot emit an RLE block that disagrees with the repcode hi=
story
- * @return : final compressed size or a ZSTD error.
- */
-ZSTDLIB_STATIC_API size_t ZSTD_compressSequences(ZSTD_CCtx* const cctx, vo=
id* dst, size_t dstSize,
-                                  const ZSTD_Sequence* inSeqs, size_t inSe=
qsSize,
-                                  const void* src, size_t srcSize);
+ * Note: Repcodes are, as of now, always re-calculated within this functio=
n, ZSTD_Sequence.rep is effectively unused.
+ * Dev Note: Once ability to ingest repcodes become available, the explici=
t block delims mode must respect those repcodes exactly,
+ *         and cannot emit an RLE block that disagrees with the repcode hi=
story.
+ * @return : final compressed size, or a ZSTD error code.
+ */
+ZSTDLIB_STATIC_API size_t
+ZSTD_compressSequences(ZSTD_CCtx* cctx,
+                       void* dst, size_t dstCapacity,
+                 const ZSTD_Sequence* inSeqs, size_t inSeqsSize,
+                 const void* src, size_t srcSize);
+
+
+/*! ZSTD_compressSequencesAndLiterals() :
+ * This is a variant of ZSTD_compressSequences() which,
+ * instead of receiving (src,srcSize) as input parameter, receives (litera=
ls,litSize),
+ * aka all the literals, already extracted and laid out into a single cont=
inuous buffer.
+ * This can be useful if the process generating the sequences also happens=
 to generate the buffer of literals,
+ * thus skipping an extraction + caching stage.
+ * It's a speed optimization, useful when the right conditions are met,
+ * but it also features the following limitations:
+ * - Only supports explicit delimiter mode
+ * - Currently does not support Sequences validation (so input Sequences a=
re trusted)
+ * - Not compatible with frame checksum, which must be disabled
+ * - If any block is incompressible, will fail and return an error
+ * - @litSize must be =3D=3D sum of all @.litLength fields in @inSeqs. Any=
 discrepancy will generate an error.
+ * - @litBufCapacity is the size of the underlying buffer into which liter=
als are written, starting at address @literals.
+ *   @litBufCapacity must be at least 8 bytes larger than @litSize.
+ * - @decompressedSize must be correct, and correspond to the sum of all S=
equences. Any discrepancy will generate an error.
+ * @return : final compressed size, or a ZSTD error code.
+ */
+ZSTDLIB_STATIC_API size_t
+ZSTD_compressSequencesAndLiterals(ZSTD_CCtx* cctx,
+                                  void* dst, size_t dstCapacity,
+                            const ZSTD_Sequence* inSeqs, size_t nbSequence=
s,
+                            const void* literals, size_t litSize, size_t l=
itBufCapacity,
+                            size_t decompressedSize);
=20
=20
 /*! ZSTD_writeSkippableFrame() :
@@ -1425,8 +1681,8 @@ ZSTDLIB_STATIC_API size_t ZSTD_compressSequences(ZSTD=
_CCtx* const cctx, void* ds
  *
  * Skippable frames begin with a 4-byte magic number. There are 16 possibl=
e choices of magic number,
  * ranging from ZSTD_MAGIC_SKIPPABLE_START to ZSTD_MAGIC_SKIPPABLE_START+1=
5.
- * As such, the parameter magicVariant controls the exact skippable frame =
magic number variant used, so
- * the magic number used will be ZSTD_MAGIC_SKIPPABLE_START + magicVariant.
+ * As such, the parameter magicVariant controls the exact skippable frame =
magic number variant used,
+ * so the magic number used will be ZSTD_MAGIC_SKIPPABLE_START + magicVari=
ant.
  *
  * Returns an error if destination buffer is not large enough, if the sour=
ce size is not representable
  * with a 4-byte unsigned int, or if the parameter magicVariant is greater=
 than 15 (and therefore invalid).
@@ -1434,26 +1690,28 @@ ZSTDLIB_STATIC_API size_t ZSTD_compressSequences(ZS=
TD_CCtx* const cctx, void* ds
  * @return : number of bytes written or a ZSTD error.
  */
 ZSTDLIB_STATIC_API size_t ZSTD_writeSkippableFrame(void* dst, size_t dstCa=
pacity,
-                                            const void* src, size_t srcSiz=
e, unsigned magicVariant);
+                                             const void* src, size_t srcSi=
ze,
+                                                   unsigned magicVariant);
=20
 /*! ZSTD_readSkippableFrame() :
- * Retrieves a zstd skippable frame containing data given by src, and writ=
es it to dst buffer.
+ * Retrieves the content of a zstd skippable frame starting at @src, and w=
rites it to @dst buffer.
  *
- * The parameter magicVariant will receive the magicVariant that was suppl=
ied when the frame was written,
- * i.e. magicNumber - ZSTD_MAGIC_SKIPPABLE_START.  This can be NULL if the=
 caller is not interested
- * in the magicVariant.
+ * The parameter @magicVariant will receive the magicVariant that was supp=
lied when the frame was written,
+ * i.e. magicNumber - ZSTD_MAGIC_SKIPPABLE_START.
+ * This can be NULL if the caller is not interested in the magicVariant.
  *
  * Returns an error if destination buffer is not large enough, or if the f=
rame is not skippable.
  *
  * @return : number of bytes written or a ZSTD error.
  */
-ZSTDLIB_API size_t ZSTD_readSkippableFrame(void* dst, size_t dstCapacity, =
unsigned* magicVariant,
-                                            const void* src, size_t srcSiz=
e);
+ZSTDLIB_STATIC_API size_t ZSTD_readSkippableFrame(void* dst, size_t dstCap=
acity,
+                                                  unsigned* magicVariant,
+                                                  const void* src, size_t =
srcSize);
=20
 /*! ZSTD_isSkippableFrame() :
  *  Tells if the content of `buffer` starts with a valid Frame Identifier =
for a skippable frame.
  */
-ZSTDLIB_API unsigned ZSTD_isSkippableFrame(const void* buffer, size_t size=
);
+ZSTDLIB_STATIC_API unsigned ZSTD_isSkippableFrame(const void* buffer, size=
_t size);
=20
=20
=20
@@ -1464,48 +1722,59 @@ ZSTDLIB_API unsigned ZSTD_isSkippableFrame(const vo=
id* buffer, size_t size);
 /*! ZSTD_estimate*() :
  *  These functions make it possible to estimate memory usage
  *  of a future {D,C}Ctx, before its creation.
+ *  This is useful in combination with ZSTD_initStatic(),
+ *  which makes it possible to employ a static buffer for ZSTD_CCtx* state.
  *
  *  ZSTD_estimateCCtxSize() will provide a memory budget large enough
- *  for any compression level up to selected one.
- *  Note : Unlike ZSTD_estimateCStreamSize*(), this estimate
- *         does not include space for a window buffer.
- *         Therefore, the estimation is only guaranteed for single-shot co=
mpressions, not streaming.
+ *  to compress data of any size using one-shot compression ZSTD_compressC=
Ctx() or ZSTD_compress2()
+ *  associated with any compression level up to max specified one.
  *  The estimate will assume the input may be arbitrarily large,
  *  which is the worst case.
  *
+ *  Note that the size estimation is specific for one-shot compression,
+ *  it is not valid for streaming (see ZSTD_estimateCStreamSize*())
+ *  nor other potential ways of using a ZSTD_CCtx* state.
+ *
  *  When srcSize can be bound by a known and rather "small" value,
- *  this fact can be used to provide a tighter estimation
- *  because the CCtx compression context will need less memory.
- *  This tighter estimation can be provided by more advanced functions
+ *  this knowledge can be used to provide a tighter budget estimation
+ *  because the ZSTD_CCtx* state will need less memory for small inputs.
+ *  This tighter estimation can be provided by employing more advanced fun=
ctions
  *  ZSTD_estimateCCtxSize_usingCParams(), which can be used in tandem with=
 ZSTD_getCParams(),
  *  and ZSTD_estimateCCtxSize_usingCCtxParams(), which can be used in tand=
em with ZSTD_CCtxParams_setParameter().
  *  Both can be used to estimate memory using custom compression parameter=
s and arbitrary srcSize limits.
  *
- *  Note 2 : only single-threaded compression is supported.
+ *  Note : only single-threaded compression is supported.
  *  ZSTD_estimateCCtxSize_usingCCtxParams() will return an error code if Z=
STD_c_nbWorkers is >=3D 1.
  */
-ZSTDLIB_STATIC_API size_t ZSTD_estimateCCtxSize(int compressionLevel);
+ZSTDLIB_STATIC_API size_t ZSTD_estimateCCtxSize(int maxCompressionLevel);
 ZSTDLIB_STATIC_API size_t ZSTD_estimateCCtxSize_usingCParams(ZSTD_compress=
ionParameters cParams);
 ZSTDLIB_STATIC_API size_t ZSTD_estimateCCtxSize_usingCCtxParams(const ZSTD=
_CCtx_params* params);
 ZSTDLIB_STATIC_API size_t ZSTD_estimateDCtxSize(void);
=20
 /*! ZSTD_estimateCStreamSize() :
- *  ZSTD_estimateCStreamSize() will provide a budget large enough for any =
compression level up to selected one.
- *  It will also consider src size to be arbitrarily "large", which is wor=
st case.
+ *  ZSTD_estimateCStreamSize() will provide a memory budget large enough f=
or streaming compression
+ *  using any compression level up to the max specified one.
+ *  It will also consider src size to be arbitrarily "large", which is a w=
orst case scenario.
  *  If srcSize is known to always be small, ZSTD_estimateCStreamSize_using=
CParams() can provide a tighter estimation.
  *  ZSTD_estimateCStreamSize_usingCParams() can be used in tandem with ZST=
D_getCParams() to create cParams from compressionLevel.
  *  ZSTD_estimateCStreamSize_usingCCtxParams() can be used in tandem with =
ZSTD_CCtxParams_setParameter(). Only single-threaded compression is support=
ed. This function will return an error code if ZSTD_c_nbWorkers is >=3D 1.
  *  Note : CStream size estimation is only correct for single-threaded com=
pression.
- *  ZSTD_DStream memory budget depends on window Size.
+ *  ZSTD_estimateCStreamSize_usingCCtxParams() will return an error code i=
f ZSTD_c_nbWorkers is >=3D 1.
+ *  Note 2 : ZSTD_estimateCStreamSize* functions are not compatible with t=
he Block-Level Sequence Producer API at this time.
+ *  Size estimates assume that no external sequence producer is registered.
+ *
+ *  ZSTD_DStream memory budget depends on frame's window Size.
  *  This information can be passed manually, using ZSTD_estimateDStreamSiz=
e,
  *  or deducted from a valid frame Header, using ZSTD_estimateDStreamSize_=
fromFrame();
+ *  Any frame requesting a window size larger than max specified one will =
be rejected.
  *  Note : if streaming is init with function ZSTD_init?Stream_usingDict(),
  *         an internal ?Dict will be created, which additional size is not=
 estimated here.
- *         In this case, get total size by adding ZSTD_estimate?DictSize */
-ZSTDLIB_STATIC_API size_t ZSTD_estimateCStreamSize(int compressionLevel);
+ *         In this case, get total size by adding ZSTD_estimate?DictSize
+ */
+ZSTDLIB_STATIC_API size_t ZSTD_estimateCStreamSize(int maxCompressionLevel=
);
 ZSTDLIB_STATIC_API size_t ZSTD_estimateCStreamSize_usingCParams(ZSTD_compr=
essionParameters cParams);
 ZSTDLIB_STATIC_API size_t ZSTD_estimateCStreamSize_usingCCtxParams(const Z=
STD_CCtx_params* params);
-ZSTDLIB_STATIC_API size_t ZSTD_estimateDStreamSize(size_t windowSize);
+ZSTDLIB_STATIC_API size_t ZSTD_estimateDStreamSize(size_t maxWindowSize);
 ZSTDLIB_STATIC_API size_t ZSTD_estimateDStreamSize_fromFrame(const void* s=
rc, size_t srcSize);
=20
 /*! ZSTD_estimate?DictSize() :
@@ -1568,7 +1837,15 @@ typedef void  (*ZSTD_freeFunction) (void* opaque, vo=
id* address);
 typedef struct { ZSTD_allocFunction customAlloc; ZSTD_freeFunction customF=
ree; void* opaque; } ZSTD_customMem;
 static
 __attribute__((__unused__))
+
+#if defined(__clang__) && __clang_major__ >=3D 5
+#pragma clang diagnostic push
+#pragma clang diagnostic ignored "-Wzero-as-null-pointer-constant"
+#endif
 ZSTD_customMem const ZSTD_defaultCMem =3D { NULL, NULL, NULL };  /*< this =
constant defers to stdlib's functions */
+#if defined(__clang__) && __clang_major__ >=3D 5
+#pragma clang diagnostic pop
+#endif
=20
 ZSTDLIB_STATIC_API ZSTD_CCtx*    ZSTD_createCCtx_advanced(ZSTD_customMem c=
ustomMem);
 ZSTDLIB_STATIC_API ZSTD_CStream* ZSTD_createCStream_advanced(ZSTD_customMe=
m customMem);
@@ -1649,22 +1926,45 @@ ZSTDLIB_STATIC_API size_t ZSTD_checkCParams(ZSTD_co=
mpressionParameters params);
  *  This function never fails (wide contract) */
 ZSTDLIB_STATIC_API ZSTD_compressionParameters ZSTD_adjustCParams(ZSTD_comp=
ressionParameters cPar, unsigned long long srcSize, size_t dictSize);
=20
+/*! ZSTD_CCtx_setCParams() :
+ *  Set all parameters provided within @p cparams into the working @p cctx.
+ *  Note : if modifying parameters during compression (MT mode only),
+ *         note that changes to the .windowLog parameter will be ignored.
+ * @return 0 on success, or an error code (can be checked with ZSTD_isErro=
r()).
+ *         On failure, no parameters are updated.
+ */
+ZSTDLIB_STATIC_API size_t ZSTD_CCtx_setCParams(ZSTD_CCtx* cctx, ZSTD_compr=
essionParameters cparams);
+
+/*! ZSTD_CCtx_setFParams() :
+ *  Set all parameters provided within @p fparams into the working @p cctx.
+ * @return 0 on success, or an error code (can be checked with ZSTD_isErro=
r()).
+ */
+ZSTDLIB_STATIC_API size_t ZSTD_CCtx_setFParams(ZSTD_CCtx* cctx, ZSTD_frame=
Parameters fparams);
+
+/*! ZSTD_CCtx_setParams() :
+ *  Set all parameters provided within @p params into the working @p cctx.
+ * @return 0 on success, or an error code (can be checked with ZSTD_isErro=
r()).
+ */
+ZSTDLIB_STATIC_API size_t ZSTD_CCtx_setParams(ZSTD_CCtx* cctx, ZSTD_parame=
ters params);
+
 /*! ZSTD_compress_advanced() :
  *  Note : this function is now DEPRECATED.
  *         It can be replaced by ZSTD_compress2(), in combination with ZST=
D_CCtx_setParameter() and other parameter setters.
  *  This prototype will generate compilation warnings. */
 ZSTD_DEPRECATED("use ZSTD_compress2")
+ZSTDLIB_STATIC_API
 size_t ZSTD_compress_advanced(ZSTD_CCtx* cctx,
-                                          void* dst, size_t dstCapacity,
-                                    const void* src, size_t srcSize,
-                                    const void* dict,size_t dictSize,
-                                          ZSTD_parameters params);
+                              void* dst, size_t dstCapacity,
+                        const void* src, size_t srcSize,
+                        const void* dict,size_t dictSize,
+                              ZSTD_parameters params);
=20
 /*! ZSTD_compress_usingCDict_advanced() :
  *  Note : this function is now DEPRECATED.
  *         It can be replaced by ZSTD_compress2(), in combination with ZST=
D_CCtx_loadDictionary() and other parameter setters.
  *  This prototype will generate compilation warnings. */
 ZSTD_DEPRECATED("use ZSTD_compress2 with ZSTD_CCtx_loadDictionary")
+ZSTDLIB_STATIC_API
 size_t ZSTD_compress_usingCDict_advanced(ZSTD_CCtx* cctx,
                                               void* dst, size_t dstCapacit=
y,
                                         const void* src, size_t srcSize,
@@ -1725,7 +2025,7 @@ ZSTDLIB_STATIC_API size_t ZSTD_CCtx_refPrefix_advance=
d(ZSTD_CCtx* cctx, const vo
  * See the comments on that enum for an explanation of the feature. */
 #define ZSTD_c_forceAttachDict ZSTD_c_experimentalParam4
=20
-/* Controlled with ZSTD_paramSwitch_e enum.
+/* Controlled with ZSTD_ParamSwitch_e enum.
  * Default is ZSTD_ps_auto.
  * Set to ZSTD_ps_disable to never compress literals.
  * Set to ZSTD_ps_enable to always compress literals. (Note: uncompressed =
literals
@@ -1737,11 +2037,6 @@ ZSTDLIB_STATIC_API size_t ZSTD_CCtx_refPrefix_advanc=
ed(ZSTD_CCtx* cctx, const vo
  */
 #define ZSTD_c_literalCompressionMode ZSTD_c_experimentalParam5
=20
-/* Tries to fit compressed block size to be around targetCBlockSize.
- * No target when targetCBlockSize =3D=3D 0.
- * There is no guarantee on compressed block size (default:0) */
-#define ZSTD_c_targetCBlockSize ZSTD_c_experimentalParam6
-
 /* User's best guess of source size.
  * Hint is not valid when srcSizeHint =3D=3D 0.
  * There is no guarantee that hint is close to actual source size,
@@ -1808,13 +2103,16 @@ ZSTDLIB_STATIC_API size_t ZSTD_CCtx_refPrefix_advan=
ced(ZSTD_CCtx* cctx, const vo
  * Experimental parameter.
  * Default is 0 =3D=3D disabled. Set to 1 to enable.
  *
- * Tells the compressor that the ZSTD_inBuffer will ALWAYS be the same
- * between calls, except for the modifications that zstd makes to pos (the
- * caller must not modify pos). This is checked by the compressor, and
- * compression will fail if it ever changes. This means the only flush
- * mode that makes sense is ZSTD_e_end, so zstd will error if ZSTD_e_end
- * is not used. The data in the ZSTD_inBuffer in the range [src, src + pos)
- * MUST not be modified during compression or you will get data corruption.
+ * Tells the compressor that input data presented with ZSTD_inBuffer
+ * will ALWAYS be the same between calls.
+ * Technically, the @src pointer must never be changed,
+ * and the @pos field can only be updated by zstd.
+ * However, it's possible to increase the @size field,
+ * allowing scenarios where more data can be appended after compressions s=
tarts.
+ * These conditions are checked by the compressor,
+ * and compression will fail if they are not respected.
+ * Also, data in the ZSTD_inBuffer within the range [src, src + pos)
+ * MUST not be modified during compression or it will result in data corru=
ption.
  *
  * When this flag is enabled zstd won't allocate an input window buffer,
  * because the user guarantees it can reference the ZSTD_inBuffer until
@@ -1822,18 +2120,15 @@ ZSTDLIB_STATIC_API size_t ZSTD_CCtx_refPrefix_advan=
ced(ZSTD_CCtx* cctx, const vo
  * large enough to fit a block (see ZSTD_c_stableOutBuffer). This will also
  * avoid the memcpy() from the input buffer to the input window buffer.
  *
- * NOTE: ZSTD_compressStream2() will error if ZSTD_e_end is not used.
- * That means this flag cannot be used with ZSTD_compressStream().
- *
  * NOTE: So long as the ZSTD_inBuffer always points to valid memory, using
  * this flag is ALWAYS memory safe, and will never access out-of-bounds
- * memory. However, compression WILL fail if you violate the preconditions.
+ * memory. However, compression WILL fail if conditions are not respected.
  *
- * WARNING: The data in the ZSTD_inBuffer in the range [dst, dst + pos) MU=
ST
- * not be modified during compression or you will get data corruption. This
- * is because zstd needs to reference data in the ZSTD_inBuffer to find
+ * WARNING: The data in the ZSTD_inBuffer in the range [src, src + pos) MU=
ST
+ * not be modified during compression or it will result in data corruption.
+ * This is because zstd needs to reference data in the ZSTD_inBuffer to fi=
nd
  * matches. Normally zstd maintains its own window buffer for this purpose,
- * but passing this flag tells zstd to use the user provided buffer.
+ * but passing this flag tells zstd to rely on user provided buffer instea=
d.
  */
 #define ZSTD_c_stableInBuffer ZSTD_c_experimentalParam9
=20
@@ -1871,22 +2166,46 @@ ZSTDLIB_STATIC_API size_t ZSTD_CCtx_refPrefix_advan=
ced(ZSTD_CCtx* cctx, const vo
 /* ZSTD_c_validateSequences
  * Default is 0 =3D=3D disabled. Set to 1 to enable sequence validation.
  *
- * For use with sequence compression API: ZSTD_compressSequences().
- * Designates whether or not we validate sequences provided to ZSTD_compre=
ssSequences()
+ * For use with sequence compression API: ZSTD_compressSequences*().
+ * Designates whether or not provided sequences are validated within ZSTD_=
compressSequences*()
  * during function execution.
  *
- * Without validation, providing a sequence that does not conform to the z=
std spec will cause
- * undefined behavior, and may produce a corrupted block.
+ * When Sequence validation is disabled (default), Sequences are compresse=
d as-is,
+ * so they must correct, otherwise it would result in a corruption error.
  *
- * With validation enabled, a if sequence is invalid (see doc/zstd_compres=
sion_format.md for
+ * Sequence validation adds some protection, by ensuring that all values r=
espect boundary conditions.
+ * If a Sequence is detected invalid (see doc/zstd_compression_format.md f=
or
  * specifics regarding offset/matchlength requirements) then the function =
will bail out and
  * return an error.
- *
  */
 #define ZSTD_c_validateSequences ZSTD_c_experimentalParam12
=20
-/* ZSTD_c_useBlockSplitter
- * Controlled with ZSTD_paramSwitch_e enum.
+/* ZSTD_c_blockSplitterLevel
+ * note: this parameter only influences the first splitter stage,
+ *       which is active before producing the sequences.
+ *       ZSTD_c_splitAfterSequences controls the next splitter stage,
+ *       which is active after sequence production.
+ *       Note that both can be combined.
+ * Allowed values are between 0 and ZSTD_BLOCKSPLITTER_LEVEL_MAX included.
+ * 0 means "auto", which will select a value depending on current ZSTD_c_s=
trategy.
+ * 1 means no splitting.
+ * Then, values from 2 to 6 are sorted in increasing cpu load order.
+ *
+ * Note that currently the first block is never split,
+ * to ensure expansion guarantees in presence of incompressible data.
+ */
+#define ZSTD_BLOCKSPLITTER_LEVEL_MAX 6
+#define ZSTD_c_blockSplitterLevel ZSTD_c_experimentalParam20
+
+/* ZSTD_c_splitAfterSequences
+ * This is a stronger splitter algorithm,
+ * based on actual sequences previously produced by the selected parser.
+ * It's also slower, and as a consequence, mostly used for high compressio=
n levels.
+ * While the post-splitter does overlap with the pre-splitter,
+ * both can nonetheless be combined,
+ * notably with ZSTD_c_blockSplitterLevel at ZSTD_BLOCKSPLITTER_LEVEL_MAX,
+ * resulting in higher compression ratio than just one of them.
+ *
  * Default is ZSTD_ps_auto.
  * Set to ZSTD_ps_disable to never use block splitter.
  * Set to ZSTD_ps_enable to always use block splitter.
@@ -1894,10 +2213,10 @@ ZSTDLIB_STATIC_API size_t ZSTD_CCtx_refPrefix_advan=
ced(ZSTD_CCtx* cctx, const vo
  * By default, in ZSTD_ps_auto, the library will decide at runtime whether=
 to use
  * block splitting based on the compression parameters.
  */
-#define ZSTD_c_useBlockSplitter ZSTD_c_experimentalParam13
+#define ZSTD_c_splitAfterSequences ZSTD_c_experimentalParam13
=20
 /* ZSTD_c_useRowMatchFinder
- * Controlled with ZSTD_paramSwitch_e enum.
+ * Controlled with ZSTD_ParamSwitch_e enum.
  * Default is ZSTD_ps_auto.
  * Set to ZSTD_ps_disable to never use row-based matchfinder.
  * Set to ZSTD_ps_enable to force usage of row-based matchfinder.
@@ -1928,6 +2247,80 @@ ZSTDLIB_STATIC_API size_t ZSTD_CCtx_refPrefix_advanc=
ed(ZSTD_CCtx* cctx, const vo
  */
 #define ZSTD_c_deterministicRefPrefix ZSTD_c_experimentalParam15
=20
+/* ZSTD_c_prefetchCDictTables
+ * Controlled with ZSTD_ParamSwitch_e enum. Default is ZSTD_ps_auto.
+ *
+ * In some situations, zstd uses CDict tables in-place rather than copying=
 them
+ * into the working context. (See docs on ZSTD_dictAttachPref_e above for =
details).
+ * In such situations, compression speed is seriously impacted when CDict =
tables are
+ * "cold" (outside CPU cache). This parameter instructs zstd to prefetch C=
Dict tables
+ * when they are used in-place.
+ *
+ * For sufficiently small inputs, the cost of the prefetch will outweigh t=
he benefit.
+ * For sufficiently large inputs, zstd will by default memcpy() CDict tabl=
es
+ * into the working context, so there is no need to prefetch. This paramet=
er is
+ * targeted at a middle range of input sizes, where a prefetch is cheap en=
ough to be
+ * useful but memcpy() is too expensive. The exact range of input sizes wh=
ere this
+ * makes sense is best determined by careful experimentation.
+ *
+ * Note: for this parameter, ZSTD_ps_auto is currently equivalent to ZSTD_=
ps_disable,
+ * but in the future zstd may conditionally enable this feature via an aut=
o-detection
+ * heuristic for cold CDicts.
+ * Use ZSTD_ps_disable to opt out of prefetching under any circumstances.
+ */
+#define ZSTD_c_prefetchCDictTables ZSTD_c_experimentalParam16
+
+/* ZSTD_c_enableSeqProducerFallback
+ * Allowed values are 0 (disable) and 1 (enable). The default setting is 0.
+ *
+ * Controls whether zstd will fall back to an internal sequence producer i=
f an
+ * external sequence producer is registered and returns an error code. Thi=
s fallback
+ * is block-by-block: the internal sequence producer will only be called f=
or blocks
+ * where the external sequence producer returns an error code. Fallback pa=
rsing will
+ * follow any other cParam settings, such as compression level, the same a=
s in a
+ * normal (fully-internal) compression operation.
+ *
+ * The user is strongly encouraged to read the full Block-Level Sequence P=
roducer API
+ * documentation (below) before setting this parameter. */
+#define ZSTD_c_enableSeqProducerFallback ZSTD_c_experimentalParam17
+
+/* ZSTD_c_maxBlockSize
+ * Allowed values are between 1KB and ZSTD_BLOCKSIZE_MAX (128KB).
+ * The default is ZSTD_BLOCKSIZE_MAX, and setting to 0 will set to the def=
ault.
+ *
+ * This parameter can be used to set an upper bound on the blocksize
+ * that overrides the default ZSTD_BLOCKSIZE_MAX. It cannot be used to set=
 upper
+ * bounds greater than ZSTD_BLOCKSIZE_MAX or bounds lower than 1KB (will m=
ake
+ * compressBound() inaccurate). Only currently meant to be used for testin=
g.
+ */
+#define ZSTD_c_maxBlockSize ZSTD_c_experimentalParam18
+
+/* ZSTD_c_repcodeResolution
+ * This parameter only has an effect if ZSTD_c_blockDelimiters is
+ * set to ZSTD_sf_explicitBlockDelimiters (may change in the future).
+ *
+ * This parameter affects how zstd parses external sequences,
+ * provided via the ZSTD_compressSequences*() API
+ * or from an external block-level sequence producer.
+ *
+ * If set to ZSTD_ps_enable, the library will check for repeated offsets w=
ithin
+ * external sequences, even if those repcodes are not explicitly indicated=
 in
+ * the "rep" field. Note that this is the only way to exploit repcode matc=
hes
+ * while using compressSequences*() or an external sequence producer, sinc=
e zstd
+ * currently ignores the "rep" field of external sequences.
+ *
+ * If set to ZSTD_ps_disable, the library will not exploit repeated offset=
s in
+ * external sequences, regardless of whether the "rep" field has been set.=
 This
+ * reduces sequence compression overhead by about 25% while sacrificing so=
me
+ * compression ratio.
+ *
+ * The default value is ZSTD_ps_auto, for which the library will enable/di=
sable
+ * based on compression level (currently: level<10 disables, level>=3D10 e=
nables).
+ */
+#define ZSTD_c_repcodeResolution ZSTD_c_experimentalParam19
+#define ZSTD_c_searchForExternalRepcodes ZSTD_c_experimentalParam19 /* old=
er name */
+
+
 /*! ZSTD_CCtx_getParameter() :
  *  Get the requested compression parameter value, selected by enum ZSTD_c=
Parameter,
  *  and store it into int* value.
@@ -2084,7 +2477,7 @@ ZSTDLIB_STATIC_API size_t ZSTD_DCtx_getParameter(ZSTD=
_DCtx* dctx, ZSTD_dParamete
  * in the range [dst, dst + pos) MUST not be modified during decompression
  * or you will get data corruption.
  *
- * When this flags is enabled zstd won't allocate an output buffer, because
+ * When this flag is enabled zstd won't allocate an output buffer, because
  * it can write directly to the ZSTD_outBuffer, but it will still allocate
  * an input buffer large enough to fit any compressed block. This will also
  * avoid the memcpy() from the internal output buffer to the ZSTD_outBuffe=
r.
@@ -2137,6 +2530,33 @@ ZSTDLIB_STATIC_API size_t ZSTD_DCtx_getParameter(ZST=
D_DCtx* dctx, ZSTD_dParamete
  */
 #define ZSTD_d_refMultipleDDicts ZSTD_d_experimentalParam4
=20
+/* ZSTD_d_disableHuffmanAssembly
+ * Set to 1 to disable the Huffman assembly implementation.
+ * The default value is 0, which allows zstd to use the Huffman assembly
+ * implementation if available.
+ *
+ * This parameter can be used to disable Huffman assembly at runtime.
+ * If you want to disable it at compile time you can define the macro
+ * ZSTD_DISABLE_ASM.
+ */
+#define ZSTD_d_disableHuffmanAssembly ZSTD_d_experimentalParam5
+
+/* ZSTD_d_maxBlockSize
+ * Allowed values are between 1KB and ZSTD_BLOCKSIZE_MAX (128KB).
+ * The default is ZSTD_BLOCKSIZE_MAX, and setting to 0 will set to the def=
ault.
+ *
+ * Forces the decompressor to reject blocks whose content size is
+ * larger than the configured maxBlockSize. When maxBlockSize is
+ * larger than the windowSize, the windowSize is used instead.
+ * This saves memory on the decoder when you know all blocks are small.
+ *
+ * This option is typically used in conjunction with ZSTD_c_maxBlockSize.
+ *
+ * WARNING: This causes the decoder to reject otherwise valid frames
+ * that have block sizes larger than the configured maxBlockSize.
+ */
+#define ZSTD_d_maxBlockSize ZSTD_d_experimentalParam6
+
=20
 /*! ZSTD_DCtx_setFormat() :
  *  This function is REDUNDANT. Prefer ZSTD_DCtx_setParameter().
@@ -2145,6 +2565,7 @@ ZSTDLIB_STATIC_API size_t ZSTD_DCtx_getParameter(ZSTD=
_DCtx* dctx, ZSTD_dParamete
  *  such ZSTD_f_zstd1_magicless for example.
  * @return : 0, or an error code (which can be tested using ZSTD_isError()=
). */
 ZSTD_DEPRECATED("use ZSTD_DCtx_setParameter() instead")
+ZSTDLIB_STATIC_API
 size_t ZSTD_DCtx_setFormat(ZSTD_DCtx* dctx, ZSTD_format_e format);
=20
 /*! ZSTD_decompressStream_simpleArgs() :
@@ -2181,6 +2602,7 @@ ZSTDLIB_STATIC_API size_t ZSTD_decompressStream_simpl=
eArgs (
  * This prototype will generate compilation warnings.
  */
 ZSTD_DEPRECATED("use ZSTD_CCtx_reset, see zstd.h for detailed instructions=
")
+ZSTDLIB_STATIC_API
 size_t ZSTD_initCStream_srcSize(ZSTD_CStream* zcs,
                          int compressionLevel,
                          unsigned long long pledgedSrcSize);
@@ -2198,17 +2620,15 @@ size_t ZSTD_initCStream_srcSize(ZSTD_CStream* zcs,
  * This prototype will generate compilation warnings.
  */
 ZSTD_DEPRECATED("use ZSTD_CCtx_reset, see zstd.h for detailed instructions=
")
+ZSTDLIB_STATIC_API
 size_t ZSTD_initCStream_usingDict(ZSTD_CStream* zcs,
                      const void* dict, size_t dictSize,
                            int compressionLevel);
=20
 /*! ZSTD_initCStream_advanced() :
- * This function is DEPRECATED, and is approximately equivalent to:
+ * This function is DEPRECATED, and is equivalent to:
  *     ZSTD_CCtx_reset(zcs, ZSTD_reset_session_only);
- *     // Pseudocode: Set each zstd parameter and leave the rest as-is.
- *     for ((param, value) : params) {
- *         ZSTD_CCtx_setParameter(zcs, param, value);
- *     }
+ *     ZSTD_CCtx_setParams(zcs, params);
  *     ZSTD_CCtx_setPledgedSrcSize(zcs, pledgedSrcSize);
  *     ZSTD_CCtx_loadDictionary(zcs, dict, dictSize);
  *
@@ -2218,6 +2638,7 @@ size_t ZSTD_initCStream_usingDict(ZSTD_CStream* zcs,
  * This prototype will generate compilation warnings.
  */
 ZSTD_DEPRECATED("use ZSTD_CCtx_reset, see zstd.h for detailed instructions=
")
+ZSTDLIB_STATIC_API
 size_t ZSTD_initCStream_advanced(ZSTD_CStream* zcs,
                     const void* dict, size_t dictSize,
                           ZSTD_parameters params,
@@ -2232,15 +2653,13 @@ size_t ZSTD_initCStream_advanced(ZSTD_CStream* zcs,
  * This prototype will generate compilation warnings.
  */
 ZSTD_DEPRECATED("use ZSTD_CCtx_reset and ZSTD_CCtx_refCDict, see zstd.h fo=
r detailed instructions")
+ZSTDLIB_STATIC_API
 size_t ZSTD_initCStream_usingCDict(ZSTD_CStream* zcs, const ZSTD_CDict* cd=
ict);
=20
 /*! ZSTD_initCStream_usingCDict_advanced() :
- *   This function is DEPRECATED, and is approximately equivalent to:
+ *   This function is DEPRECATED, and is equivalent to:
  *     ZSTD_CCtx_reset(zcs, ZSTD_reset_session_only);
- *     // Pseudocode: Set each zstd frame parameter and leave the rest as-=
is.
- *     for ((fParam, value) : fParams) {
- *         ZSTD_CCtx_setParameter(zcs, fParam, value);
- *     }
+ *     ZSTD_CCtx_setFParams(zcs, fParams);
  *     ZSTD_CCtx_setPledgedSrcSize(zcs, pledgedSrcSize);
  *     ZSTD_CCtx_refCDict(zcs, cdict);
  *
@@ -2250,6 +2669,7 @@ size_t ZSTD_initCStream_usingCDict(ZSTD_CStream* zcs,=
 const ZSTD_CDict* cdict);
  * This prototype will generate compilation warnings.
  */
 ZSTD_DEPRECATED("use ZSTD_CCtx_reset and ZSTD_CCtx_refCDict, see zstd.h fo=
r detailed instructions")
+ZSTDLIB_STATIC_API
 size_t ZSTD_initCStream_usingCDict_advanced(ZSTD_CStream* zcs,
                                const ZSTD_CDict* cdict,
                                      ZSTD_frameParameters fParams,
@@ -2264,7 +2684,7 @@ size_t ZSTD_initCStream_usingCDict_advanced(ZSTD_CStr=
eam* zcs,
  *       explicitly specified.
  *
  *  start a new frame, using same parameters from previous frame.
- *  This is typically useful to skip dictionary loading stage, since it wi=
ll re-use it in-place.
+ *  This is typically useful to skip dictionary loading stage, since it wi=
ll reuse it in-place.
  *  Note that zcs must be init at least once before using ZSTD_resetCStrea=
m().
  *  If pledgedSrcSize is not known at reset time, use macro ZSTD_CONTENTSI=
ZE_UNKNOWN.
  *  If pledgedSrcSize > 0, its value must be correct, as it will be writte=
n in header, and controlled at the end.
@@ -2274,6 +2694,7 @@ size_t ZSTD_initCStream_usingCDict_advanced(ZSTD_CStr=
eam* zcs,
  *  This prototype will generate compilation warnings.
  */
 ZSTD_DEPRECATED("use ZSTD_CCtx_reset, see zstd.h for detailed instructions=
")
+ZSTDLIB_STATIC_API
 size_t ZSTD_resetCStream(ZSTD_CStream* zcs, unsigned long long pledgedSrcS=
ize);
=20
=20
@@ -2319,8 +2740,8 @@ ZSTDLIB_STATIC_API size_t ZSTD_toFlushNow(ZSTD_CCtx* =
cctx);
  *     ZSTD_DCtx_loadDictionary(zds, dict, dictSize);
  *
  * note: no dictionary will be used if dict =3D=3D NULL or dictSize < 8
- * Note : this prototype will be marked as deprecated and generate compila=
tion warnings on reaching v1.5.x
  */
+ZSTD_DEPRECATED("use ZSTD_DCtx_reset + ZSTD_DCtx_loadDictionary, see zstd.=
h for detailed instructions")
 ZSTDLIB_STATIC_API size_t ZSTD_initDStream_usingDict(ZSTD_DStream* zds, co=
nst void* dict, size_t dictSize);
=20
 /*!
@@ -2330,8 +2751,8 @@ ZSTDLIB_STATIC_API size_t ZSTD_initDStream_usingDict(=
ZSTD_DStream* zds, const vo
  *     ZSTD_DCtx_refDDict(zds, ddict);
  *
  * note : ddict is referenced, it must outlive decompression session
- * Note : this prototype will be marked as deprecated and generate compila=
tion warnings on reaching v1.5.x
  */
+ZSTD_DEPRECATED("use ZSTD_DCtx_reset + ZSTD_DCtx_refDDict, see zstd.h for =
detailed instructions")
 ZSTDLIB_STATIC_API size_t ZSTD_initDStream_usingDDict(ZSTD_DStream* zds, c=
onst ZSTD_DDict* ddict);
=20
 /*!
@@ -2339,18 +2760,202 @@ ZSTDLIB_STATIC_API size_t ZSTD_initDStream_usingDD=
ict(ZSTD_DStream* zds, const Z
  *
  *     ZSTD_DCtx_reset(zds, ZSTD_reset_session_only);
  *
- * re-use decompression parameters from previous init; saves dictionary lo=
ading
- * Note : this prototype will be marked as deprecated and generate compila=
tion warnings on reaching v1.5.x
+ * reuse decompression parameters from previous init; saves dictionary loa=
ding
  */
+ZSTD_DEPRECATED("use ZSTD_DCtx_reset, see zstd.h for detailed instructions=
")
 ZSTDLIB_STATIC_API size_t ZSTD_resetDStream(ZSTD_DStream* zds);
=20
=20
+/* ********************* BLOCK-LEVEL SEQUENCE PRODUCER API ***************=
******
+ *
+ * *** OVERVIEW ***
+ * The Block-Level Sequence Producer API allows users to provide their own=
 custom
+ * sequence producer which libzstd invokes to process each block. The prod=
uced list
+ * of sequences (literals and matches) is then post-processed by libzstd t=
o produce
+ * valid compressed blocks.
+ *
+ * This block-level offload API is a more granular complement of the exist=
ing
+ * frame-level offload API compressSequences() (introduced in v1.5.1). It =
offers
+ * an easier migration story for applications already integrated with libz=
std: the
+ * user application continues to invoke the same compression functions
+ * ZSTD_compress2() or ZSTD_compressStream2() as usual, and transparently =
benefits
+ * from the specific advantages of the external sequence producer. For exa=
mple,
+ * the sequence producer could be tuned to take advantage of known charact=
eristics
+ * of the input, to offer better speed / ratio, or could leverage hardware
+ * acceleration not available within libzstd itself.
+ *
+ * See contrib/externalSequenceProducer for an example program employing t=
he
+ * Block-Level Sequence Producer API.
+ *
+ * *** USAGE ***
+ * The user is responsible for implementing a function of type
+ * ZSTD_sequenceProducer_F. For each block, zstd will pass the following
+ * arguments to the user-provided function:
+ *
+ *   - sequenceProducerState: a pointer to a user-managed state for the se=
quence
+ *     producer.
+ *
+ *   - outSeqs, outSeqsCapacity: an output buffer for the sequence produce=
r.
+ *     outSeqsCapacity is guaranteed >=3D ZSTD_sequenceBound(srcSize). The=
 memory
+ *     backing outSeqs is managed by the CCtx.
+ *
+ *   - src, srcSize: an input buffer for the sequence producer to parse.
+ *     srcSize is guaranteed to be <=3D ZSTD_BLOCKSIZE_MAX.
+ *
+ *   - dict, dictSize: a history buffer, which may be empty, which the seq=
uence
+ *     producer may reference as it parses the src buffer. Currently, zstd=
 will
+ *     always pass dictSize =3D=3D 0 into external sequence producers, but=
 this will
+ *     change in the future.
+ *
+ *   - compressionLevel: a signed integer representing the zstd compressio=
n level
+ *     set by the user for the current operation. The sequence producer ma=
y choose
+ *     to use this information to change its compression strategy and spee=
d/ratio
+ *     tradeoff. Note: the compression level does not reflect zstd paramet=
ers set
+ *     through the advanced API.
+ *
+ *   - windowSize: a size_t representing the maximum allowed offset for ex=
ternal
+ *     sequences. Note that sequence offsets are sometimes allowed to exce=
ed the
+ *     windowSize if a dictionary is present, see doc/zstd_compression_for=
mat.md
+ *     for details.
+ *
+ * The user-provided function shall return a size_t representing the numbe=
r of
+ * sequences written to outSeqs. This return value will be treated as an e=
rror
+ * code if it is greater than outSeqsCapacity. The return value must be no=
n-zero
+ * if srcSize is non-zero. The ZSTD_SEQUENCE_PRODUCER_ERROR macro is provi=
ded
+ * for convenience, but any value greater than outSeqsCapacity will be tre=
ated as
+ * an error code.
+ *
+ * If the user-provided function does not return an error code, the sequen=
ces
+ * written to outSeqs must be a valid parse of the src buffer. Data corrup=
tion may
+ * occur if the parse is not valid. A parse is defined to be valid if the
+ * following conditions hold:
+ *   - The sum of matchLengths and literalLengths must equal srcSize.
+ *   - All sequences in the parse, except for the final sequence, must have
+ *     matchLength >=3D ZSTD_MINMATCH_MIN. The final sequence must have
+ *     matchLength >=3D ZSTD_MINMATCH_MIN or matchLength =3D=3D 0.
+ *   - All offsets must respect the windowSize parameter as specified in
+ *     doc/zstd_compression_format.md.
+ *   - If the final sequence has matchLength =3D=3D 0, it must also have o=
ffset =3D=3D 0.
+ *
+ * zstd will only validate these conditions (and fail compression if they =
do not
+ * hold) if the ZSTD_c_validateSequences cParam is enabled. Note that sequ=
ence
+ * validation has a performance cost.
+ *
+ * If the user-provided function returns an error, zstd will either fall b=
ack
+ * to an internal sequence producer or fail the compression operation. The=
 user can
+ * choose between the two behaviors by setting the ZSTD_c_enableSeqProduce=
rFallback
+ * cParam. Fallback compression will follow any other cParam settings, suc=
h as
+ * compression level, the same as in a normal compression operation.
+ *
+ * The user shall instruct zstd to use a particular ZSTD_sequenceProducer_F
+ * function by calling
+ *         ZSTD_registerSequenceProducer(cctx,
+ *                                       sequenceProducerState,
+ *                                       sequenceProducer)
+ * This setting will persist until the next parameter reset of the CCtx.
+ *
+ * The sequenceProducerState must be initialized by the user before calling
+ * ZSTD_registerSequenceProducer(). The user is responsible for destroying=
 the
+ * sequenceProducerState.
+ *
+ * *** LIMITATIONS ***
+ * This API is compatible with all zstd compression APIs which respect adv=
anced parameters.
+ * However, there are three limitations:
+ *
+ * First, the ZSTD_c_enableLongDistanceMatching cParam is not currently su=
pported.
+ * COMPRESSION WILL FAIL if it is enabled and the user tries to compress w=
ith a block-level
+ * external sequence producer.
+ *   - Note that ZSTD_c_enableLongDistanceMatching is auto-enabled by defa=
ult in some
+ *     cases (see its documentation for details). Users must explicitly set
+ *     ZSTD_c_enableLongDistanceMatching to ZSTD_ps_disable in such cases =
if an external
+ *     sequence producer is registered.
+ *   - As of this writing, ZSTD_c_enableLongDistanceMatching is disabled b=
y default
+ *     whenever ZSTD_c_windowLog < 128MB, but that's subject to change. Us=
ers should
+ *     check the docs on ZSTD_c_enableLongDistanceMatching whenever the Bl=
ock-Level Sequence
+ *     Producer API is used in conjunction with advanced settings (like ZS=
TD_c_windowLog).
+ *
+ * Second, history buffers are not currently supported. Concretely, zstd w=
ill always pass
+ * dictSize =3D=3D 0 to the external sequence producer (for now). This has=
 two implications:
+ *   - Dictionaries are not currently supported. Compression will *not* fa=
il if the user
+ *     references a dictionary, but the dictionary won't have any effect.
+ *   - Stream history is not currently supported. All advanced compression=
 APIs, including
+ *     streaming APIs, work with external sequence producers, but each blo=
ck is treated as
+ *     an independent chunk without history from previous blocks.
+ *
+ * Third, multi-threading within a single compression is not currently sup=
ported. In other words,
+ * COMPRESSION WILL FAIL if ZSTD_c_nbWorkers > 0 and an external sequence =
producer is registered.
+ * Multi-threading across compressions is fine: simply create one CCtx per=
 thread.
+ *
+ * Long-term, we plan to overcome all three limitations. There is no techn=
ical blocker to
+ * overcoming them. It is purely a question of engineering effort.
+ */
+
+#define ZSTD_SEQUENCE_PRODUCER_ERROR ((size_t)(-1))
+
+typedef size_t (*ZSTD_sequenceProducer_F) (
+  void* sequenceProducerState,
+  ZSTD_Sequence* outSeqs, size_t outSeqsCapacity,
+  const void* src, size_t srcSize,
+  const void* dict, size_t dictSize,
+  int compressionLevel,
+  size_t windowSize
+);
+
+/*! ZSTD_registerSequenceProducer() :
+ * Instruct zstd to use a block-level external sequence producer function.
+ *
+ * The sequenceProducerState must be initialized by the caller, and the ca=
ller is
+ * responsible for managing its lifetime. This parameter is sticky across
+ * compressions. It will remain set until the user explicitly resets compr=
ession
+ * parameters.
+ *
+ * Sequence producer registration is considered to be an "advanced paramet=
er",
+ * part of the "advanced API". This means it will only have an effect on c=
ompression
+ * APIs which respect advanced parameters, such as compress2() and compres=
sStream2().
+ * Older compression APIs such as compressCCtx(), which predate the introd=
uction of
+ * "advanced parameters", will ignore any external sequence producer setti=
ng.
+ *
+ * The sequence producer can be "cleared" by registering a NULL function p=
ointer. This
+ * removes all limitations described above in the "LIMITATIONS" section of=
 the API docs.
+ *
+ * The user is strongly encouraged to read the full API documentation (abo=
ve) before
+ * calling this function. */
+ZSTDLIB_STATIC_API void
+ZSTD_registerSequenceProducer(
+  ZSTD_CCtx* cctx,
+  void* sequenceProducerState,
+  ZSTD_sequenceProducer_F sequenceProducer
+);
+
+/*! ZSTD_CCtxParams_registerSequenceProducer() :
+ * Same as ZSTD_registerSequenceProducer(), but operates on ZSTD_CCtx_para=
ms.
+ * This is used for accurate size estimation with ZSTD_estimateCCtxSize_us=
ingCCtxParams(),
+ * which is needed when creating a ZSTD_CCtx with ZSTD_initStaticCCtx().
+ *
+ * If you are using the external sequence producer API in a scenario where=
 ZSTD_initStaticCCtx()
+ * is required, then this function is for you. Otherwise, you probably don=
't need it.
+ *
+ * See tests/zstreamtest.c for example usage. */
+ZSTDLIB_STATIC_API void
+ZSTD_CCtxParams_registerSequenceProducer(
+  ZSTD_CCtx_params* params,
+  void* sequenceProducerState,
+  ZSTD_sequenceProducer_F sequenceProducer
+);
+
+
 /* *******************************************************************
-*  Buffer-less and synchronous inner streaming functions
+*  Buffer-less and synchronous inner streaming functions (DEPRECATED)
+*
+*  This API is deprecated, and will be removed in a future version.
+*  It allows streaming (de)compression with user allocated buffers.
+*  However, it is hard to use, and not as well tested as the rest of
+*  our API.
 *
-*  This is an advanced API, giving full control over buffer management, fo=
r users which need direct control over memory.
-*  But it's also a complex one, with several restrictions, documented belo=
w.
-*  Prefer normal streaming API for an easier experience.
+*  Please use the normal streaming API instead: ZSTD_compressStream2,
+*  and ZSTD_decompressStream.
+*  If there is functionality that you need, but it doesn't provide,
+*  please open an issue on our GitHub.
 ********************************************************************* */
=20
 /*
@@ -2358,11 +2963,10 @@ ZSTDLIB_STATIC_API size_t ZSTD_resetDStream(ZSTD_DS=
tream* zds);
=20
   A ZSTD_CCtx object is required to track streaming operations.
   Use ZSTD_createCCtx() / ZSTD_freeCCtx() to manage resource.
-  ZSTD_CCtx object can be re-used multiple times within successive compres=
sion operations.
+  ZSTD_CCtx object can be reused multiple times within successive compress=
ion operations.
=20
   Start by initializing a context.
   Use ZSTD_compressBegin(), or ZSTD_compressBegin_usingDict() for dictiona=
ry compression.
-  It's also possible to duplicate a reference context which has already be=
en initialized, using ZSTD_copyCCtx()
=20
   Then, consume your input using ZSTD_compressContinue().
   There are some important considerations to keep in mind when using this =
advanced function :
@@ -2380,39 +2984,49 @@ ZSTDLIB_STATIC_API size_t ZSTD_resetDStream(ZSTD_DS=
tream* zds);
   It's possible to use srcSize=3D=3D0, in which case, it will write a fina=
l empty block to end the frame.
   Without last block mark, frames are considered unfinished (hence corrupt=
ed) by compliant decoders.
=20
-  `ZSTD_CCtx` object can be re-used (ZSTD_compressBegin()) to compress aga=
in.
+  `ZSTD_CCtx` object can be reused (ZSTD_compressBegin()) to compress agai=
n.
 */
=20
 /*=3D=3D=3D=3D=3D   Buffer-less streaming compression functions  =3D=3D=3D=
=3D=3D*/
+ZSTD_DEPRECATED("The buffer-less API is deprecated in favor of the normal =
streaming API. See docs.")
 ZSTDLIB_STATIC_API size_t ZSTD_compressBegin(ZSTD_CCtx* cctx, int compress=
ionLevel);
+ZSTD_DEPRECATED("The buffer-less API is deprecated in favor of the normal =
streaming API. See docs.")
 ZSTDLIB_STATIC_API size_t ZSTD_compressBegin_usingDict(ZSTD_CCtx* cctx, co=
nst void* dict, size_t dictSize, int compressionLevel);
+ZSTD_DEPRECATED("The buffer-less API is deprecated in favor of the normal =
streaming API. See docs.")
 ZSTDLIB_STATIC_API size_t ZSTD_compressBegin_usingCDict(ZSTD_CCtx* cctx, c=
onst ZSTD_CDict* cdict); /*< note: fails if cdict=3D=3DNULL */
-ZSTDLIB_STATIC_API size_t ZSTD_copyCCtx(ZSTD_CCtx* cctx, const ZSTD_CCtx* =
preparedCCtx, unsigned long long pledgedSrcSize); /*<  note: if pledgedSrcS=
ize is not known, use ZSTD_CONTENTSIZE_UNKNOWN */
=20
+ZSTD_DEPRECATED("This function will likely be removed in a future release.=
 It is misleading and has very limited utility.")
+ZSTDLIB_STATIC_API
+size_t ZSTD_copyCCtx(ZSTD_CCtx* cctx, const ZSTD_CCtx* preparedCCtx, unsig=
ned long long pledgedSrcSize); /*<  note: if pledgedSrcSize is not known, u=
se ZSTD_CONTENTSIZE_UNKNOWN */
+
+ZSTD_DEPRECATED("The buffer-less API is deprecated in favor of the normal =
streaming API. See docs.")
 ZSTDLIB_STATIC_API size_t ZSTD_compressContinue(ZSTD_CCtx* cctx, void* dst=
, size_t dstCapacity, const void* src, size_t srcSize);
+ZSTD_DEPRECATED("The buffer-less API is deprecated in favor of the normal =
streaming API. See docs.")
 ZSTDLIB_STATIC_API size_t ZSTD_compressEnd(ZSTD_CCtx* cctx, void* dst, siz=
e_t dstCapacity, const void* src, size_t srcSize);
=20
 /* The ZSTD_compressBegin_advanced() and ZSTD_compressBegin_usingCDict_adv=
anced() are now DEPRECATED and will generate a compiler warning */
 ZSTD_DEPRECATED("use advanced API to access custom parameters")
+ZSTDLIB_STATIC_API
 size_t ZSTD_compressBegin_advanced(ZSTD_CCtx* cctx, const void* dict, size=
_t dictSize, ZSTD_parameters params, unsigned long long pledgedSrcSize); /*=
< pledgedSrcSize : If srcSize is not known at init time, use ZSTD_CONTENTSI=
ZE_UNKNOWN */
 ZSTD_DEPRECATED("use advanced API to access custom parameters")
+ZSTDLIB_STATIC_API
 size_t ZSTD_compressBegin_usingCDict_advanced(ZSTD_CCtx* const cctx, const=
 ZSTD_CDict* const cdict, ZSTD_frameParameters const fParams, unsigned long=
 long const pledgedSrcSize);   /* compression parameters are already set wi=
thin cdict. pledgedSrcSize must be correct. If srcSize is not known, use ma=
cro ZSTD_CONTENTSIZE_UNKNOWN */
 /*
   Buffer-less streaming decompression (synchronous mode)
=20
   A ZSTD_DCtx object is required to track streaming operations.
   Use ZSTD_createDCtx() / ZSTD_freeDCtx() to manage it.
-  A ZSTD_DCtx object can be re-used multiple times.
+  A ZSTD_DCtx object can be reused multiple times.
=20
   First typical operation is to retrieve frame parameters, using ZSTD_getF=
rameHeader().
   Frame header is extracted from the beginning of compressed frame, so pro=
viding only the frame's beginning is enough.
   Data fragment must be large enough to ensure successful decoding.
  `ZSTD_frameHeaderSize_max` bytes is guaranteed to always be large enough.
-  @result : 0 : successful decoding, the `ZSTD_frameHeader` structure is c=
orrectly filled.
-           >0 : `srcSize` is too small, please provide at least @result by=
tes on next attempt.
+  result  : 0 : successful decoding, the `ZSTD_frameHeader` structure is c=
orrectly filled.
+           >0 : `srcSize` is too small, please provide at least result byt=
es on next attempt.
            errorCode, which can be tested using ZSTD_isError().
=20
-  It fills a ZSTD_frameHeader structure with important information to corr=
ectly decode the frame,
+  It fills a ZSTD_FrameHeader structure with important information to corr=
ectly decode the frame,
   such as the dictionary ID, content size, or maximum back-reference dista=
nce (`windowSize`).
   Note that these values could be wrong, either because of data corruption=
, or because a 3rd party deliberately spoofs false information.
   As a consequence, check that values remain within valid application rang=
e.
@@ -2428,7 +3042,7 @@ size_t ZSTD_compressBegin_usingCDict_advanced(ZSTD_CC=
tx* const cctx, const ZSTD_
=20
   The most memory efficient way is to use a round buffer of sufficient siz=
e.
   Sufficient size is determined by invoking ZSTD_decodingBufferSize_min(),
-  which can @return an error code if required value is too large for curre=
nt system (in 32-bits mode).
+  which can return an error code if required value is too large for curren=
t system (in 32-bits mode).
   In a round buffer methodology, ZSTD_decompressContinue() decompresses ea=
ch block next to previous one,
   up to the moment there is not enough room left in the buffer to guarante=
e decoding another full block,
   which maximum size is provided in `ZSTD_frameHeader` structure, field `b=
lockSizeMax`.
@@ -2448,7 +3062,7 @@ size_t ZSTD_compressBegin_usingCDict_advanced(ZSTD_CC=
tx* const cctx, const ZSTD_
   ZSTD_nextSrcSizeToDecompress() tells how many bytes to provide as 'srcSi=
ze' to ZSTD_decompressContinue().
   ZSTD_decompressContinue() requires this _exact_ amount of bytes, or it w=
ill fail.
=20
- @result of ZSTD_decompressContinue() is the number of bytes regenerated w=
ithin 'dst' (necessarily <=3D dstCapacity).
+  result of ZSTD_decompressContinue() is the number of bytes regenerated w=
ithin 'dst' (necessarily <=3D dstCapacity).
   It can be zero : it just means ZSTD_decompressContinue() has decoded som=
e metadata item.
   It can also be an error code, which can be tested with ZSTD_isError().
=20
@@ -2471,27 +3085,7 @@ size_t ZSTD_compressBegin_usingCDict_advanced(ZSTD_C=
Ctx* const cctx, const ZSTD_
 */
=20
 /*=3D=3D=3D=3D=3D   Buffer-less streaming decompression functions  =3D=3D=
=3D=3D=3D*/
-typedef enum { ZSTD_frame, ZSTD_skippableFrame } ZSTD_frameType_e;
-typedef struct {
-    unsigned long long frameContentSize; /* if =3D=3D ZSTD_CONTENTSIZE_UNK=
NOWN, it means this field is not available. 0 means "empty" */
-    unsigned long long windowSize;       /* can be very large, up to <=3D =
frameContentSize */
-    unsigned blockSizeMax;
-    ZSTD_frameType_e frameType;          /* if =3D=3D ZSTD_skippableFrame,=
 frameContentSize is the size of skippable content */
-    unsigned headerSize;
-    unsigned dictID;
-    unsigned checksumFlag;
-} ZSTD_frameHeader;
=20
-/*! ZSTD_getFrameHeader() :
- *  decode Frame Header, or requires larger `srcSize`.
- * @return : 0, `zfhPtr` is correctly filled,
- *          >0, `srcSize` is too small, value is wanted `srcSize` amount,
- *           or an error code, which can be tested using ZSTD_isError() */
-ZSTDLIB_STATIC_API size_t ZSTD_getFrameHeader(ZSTD_frameHeader* zfhPtr, co=
nst void* src, size_t srcSize);   /*< doesn't consume input */
-/*! ZSTD_getFrameHeader_advanced() :
- *  same as ZSTD_getFrameHeader(),
- *  with added capability to select a format (like ZSTD_f_zstd1_magicless)=
 */
-ZSTDLIB_STATIC_API size_t ZSTD_getFrameHeader_advanced(ZSTD_frameHeader* z=
fhPtr, const void* src, size_t srcSize, ZSTD_format_e format);
 ZSTDLIB_STATIC_API size_t ZSTD_decodingBufferSize_min(unsigned long long w=
indowSize, unsigned long long frameContentSize);  /*< when frame content si=
ze is not known, pass in frameContentSize =3D=3D ZSTD_CONTENTSIZE_UNKNOWN */
=20
 ZSTDLIB_STATIC_API size_t ZSTD_decompressBegin(ZSTD_DCtx* dctx);
@@ -2502,6 +3096,7 @@ ZSTDLIB_STATIC_API size_t ZSTD_nextSrcSizeToDecompres=
s(ZSTD_DCtx* dctx);
 ZSTDLIB_STATIC_API size_t ZSTD_decompressContinue(ZSTD_DCtx* dctx, void* d=
st, size_t dstCapacity, const void* src, size_t srcSize);
=20
 /* misc */
+ZSTD_DEPRECATED("This function will likely be removed in the next minor re=
lease. It is misleading and has very limited utility.")
 ZSTDLIB_STATIC_API void   ZSTD_copyDCtx(ZSTD_DCtx* dctx, const ZSTD_DCtx* =
preparedDCtx);
 typedef enum { ZSTDnit_frameHeader, ZSTDnit_blockHeader, ZSTDnit_block, ZS=
TDnit_lastBlock, ZSTDnit_checksum, ZSTDnit_skippableFrame } ZSTD_nextInputT=
ype_e;
 ZSTDLIB_STATIC_API ZSTD_nextInputType_e ZSTD_nextInputType(ZSTD_DCtx* dctx=
);
@@ -2509,11 +3104,23 @@ ZSTDLIB_STATIC_API ZSTD_nextInputType_e ZSTD_nextIn=
putType(ZSTD_DCtx* dctx);
=20
=20
=20
-/* =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D */
-/*       Block level API       */
-/* =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D */
+/* =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D */
+/*       Block level API (DEPRECATED)       */
+/* =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D */
=20
 /*!
+
+    This API is deprecated in favor of the regular compression API.
+    You can get the frame header down to 2 bytes by setting:
+      - ZSTD_c_format =3D ZSTD_f_zstd1_magicless
+      - ZSTD_c_contentSizeFlag =3D 0
+      - ZSTD_c_checksumFlag =3D 0
+      - ZSTD_c_dictIDFlag =3D 0
+
+    This API is not as well tested as our normal API, so we recommend not =
using it.
+    We will be removing it in a future version. If the normal API doesn't =
provide
+    the functionality you need, please open a GitHub issue.
+
     Block functions produce and decode raw zstd blocks, without frame meta=
data.
     Frame metadata cost is typically ~12 bytes, which can be non-negligibl=
e for very small blocks (< 100 bytes).
     But users will have to take in charge needed metadata to regenerate da=
ta, such as compressed and content sizes.
@@ -2524,7 +3131,6 @@ ZSTDLIB_STATIC_API ZSTD_nextInputType_e ZSTD_nextInpu=
tType(ZSTD_DCtx* dctx);
     - It is necessary to init context before starting
       + compression : any ZSTD_compressBegin*() variant, including with di=
ctionary
       + decompression : any ZSTD_decompressBegin*() variant, including wit=
h dictionary
-      + copyCCtx() and copyDCtx() can be used too
     - Block size is limited, it must be <=3D ZSTD_getBlockSize() <=3D ZSTD=
_BLOCKSIZE_MAX =3D=3D 128 KB
       + If input is larger than a block size, it's necessary to split inpu=
t data into multiple blocks
       + For inputs larger than a single block, consider using regular ZSTD=
_compress() instead.
@@ -2541,11 +3147,14 @@ ZSTDLIB_STATIC_API ZSTD_nextInputType_e ZSTD_nextIn=
putType(ZSTD_DCtx* dctx);
 */
=20
 /*=3D=3D=3D=3D=3D   Raw zstd block functions  =3D=3D=3D=3D=3D*/
+ZSTD_DEPRECATED("The block API is deprecated in favor of the normal compre=
ssion API. See docs.")
 ZSTDLIB_STATIC_API size_t ZSTD_getBlockSize   (const ZSTD_CCtx* cctx);
+ZSTD_DEPRECATED("The block API is deprecated in favor of the normal compre=
ssion API. See docs.")
 ZSTDLIB_STATIC_API size_t ZSTD_compressBlock  (ZSTD_CCtx* cctx, void* dst,=
 size_t dstCapacity, const void* src, size_t srcSize);
+ZSTD_DEPRECATED("The block API is deprecated in favor of the normal compre=
ssion API. See docs.")
 ZSTDLIB_STATIC_API size_t ZSTD_decompressBlock(ZSTD_DCtx* dctx, void* dst,=
 size_t dstCapacity, const void* src, size_t srcSize);
+ZSTD_DEPRECATED("The block API is deprecated in favor of the normal compre=
ssion API. See docs.")
 ZSTDLIB_STATIC_API size_t ZSTD_insertBlock    (ZSTD_DCtx* dctx, const void=
* blockStart, size_t blockSize);  /*< insert uncompressed block into `dctx`=
 history. Useful for multi-blocks decompression. */
=20
=20
 #endif   /* ZSTD_H_ZSTD_STATIC_LINKING_ONLY */
-
diff --git a/lib/zstd/Makefile b/lib/zstd/Makefile
index 20f08c644b71..be218b5e0ed5 100644
--- a/lib/zstd/Makefile
+++ b/lib/zstd/Makefile
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
 # ################################################################
-# Copyright (c) Facebook, Inc.
+# Copyright (c) Meta Platforms, Inc. and affiliates.
 # All rights reserved.
 #
 # This source code is licensed under both the BSD-style license (found in =
the
@@ -26,6 +26,7 @@ zstd_compress-y :=3D \
 		compress/zstd_lazy.o \
 		compress/zstd_ldm.o \
 		compress/zstd_opt.o \
+		compress/zstd_preSplit.o \
=20
 zstd_decompress-y :=3D \
 		zstd_decompress_module.o \
diff --git a/lib/zstd/common/allocations.h b/lib/zstd/common/allocations.h
new file mode 100644
index 000000000000..16c3d08e8d1a
--- /dev/null
+++ b/lib/zstd/common/allocations.h
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
+/*
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
+ * All rights reserved.
+ *
+ * This source code is licensed under both the BSD-style license (found in=
 the
+ * LICENSE file in the root directory of this source tree) and the GPLv2 (=
found
+ * in the COPYING file in the root directory of this source tree).
+ * You may select, at your option, one of the above-listed licenses.
+ */
+
+/* This file provides custom allocation primitives
+ */
+
+#define ZSTD_DEPS_NEED_MALLOC
+#include "zstd_deps.h"   /* ZSTD_malloc, ZSTD_calloc, ZSTD_free, ZSTD_mems=
et */
+
+#include "compiler.h" /* MEM_STATIC */
+#define ZSTD_STATIC_LINKING_ONLY
+#include <linux/zstd.h> /* ZSTD_customMem */
+
+#ifndef ZSTD_ALLOCATIONS_H
+#define ZSTD_ALLOCATIONS_H
+
+/* custom memory allocation functions */
+
+MEM_STATIC void* ZSTD_customMalloc(size_t size, ZSTD_customMem customMem)
+{
+    if (customMem.customAlloc)
+        return customMem.customAlloc(customMem.opaque, size);
+    return ZSTD_malloc(size);
+}
+
+MEM_STATIC void* ZSTD_customCalloc(size_t size, ZSTD_customMem customMem)
+{
+    if (customMem.customAlloc) {
+        /* calloc implemented as malloc+memset;
+         * not as efficient as calloc, but next best guess for custom mall=
oc */
+        void* const ptr =3D customMem.customAlloc(customMem.opaque, size);
+        ZSTD_memset(ptr, 0, size);
+        return ptr;
+    }
+    return ZSTD_calloc(1, size);
+}
+
+MEM_STATIC void ZSTD_customFree(void* ptr, ZSTD_customMem customMem)
+{
+    if (ptr!=3DNULL) {
+        if (customMem.customFree)
+            customMem.customFree(customMem.opaque, ptr);
+        else
+            ZSTD_free(ptr);
+    }
+}
+
+#endif /* ZSTD_ALLOCATIONS_H */
diff --git a/lib/zstd/common/bits.h b/lib/zstd/common/bits.h
new file mode 100644
index 000000000000..c5faaa3d7b08
--- /dev/null
+++ b/lib/zstd/common/bits.h
@@ -0,0 +1,150 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
+/*
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
+ * All rights reserved.
+ *
+ * This source code is licensed under both the BSD-style license (found in=
 the
+ * LICENSE file in the root directory of this source tree) and the GPLv2 (=
found
+ * in the COPYING file in the root directory of this source tree).
+ * You may select, at your option, one of the above-listed licenses.
+ */
+
+#ifndef ZSTD_BITS_H
+#define ZSTD_BITS_H
+
+#include "mem.h"
+
+MEM_STATIC unsigned ZSTD_countTrailingZeros32_fallback(U32 val)
+{
+    assert(val !=3D 0);
+    {
+        static const U32 DeBruijnBytePos[32] =3D {0, 1, 28, 2, 29, 14, 24,=
 3,
+                                                30, 22, 20, 15, 25, 17, 4,=
 8,
+                                                31, 27, 13, 23, 21, 19, 16=
, 7,
+                                                26, 12, 18, 6, 11, 5, 10, =
9};
+        return DeBruijnBytePos[((U32) ((val & -(S32) val) * 0x077CB531U)) =
>> 27];
+    }
+}
+
+MEM_STATIC unsigned ZSTD_countTrailingZeros32(U32 val)
+{
+    assert(val !=3D 0);
+#if (__GNUC__ >=3D 4)
+    return (unsigned)__builtin_ctz(val);
+#else
+    return ZSTD_countTrailingZeros32_fallback(val);
+#endif
+}
+
+MEM_STATIC unsigned ZSTD_countLeadingZeros32_fallback(U32 val)
+{
+    assert(val !=3D 0);
+    {
+        static const U32 DeBruijnClz[32] =3D {0, 9, 1, 10, 13, 21, 2, 29,
+                                            11, 14, 16, 18, 22, 25, 3, 30,
+                                            8, 12, 20, 28, 15, 17, 24, 7,
+                                            19, 27, 23, 6, 26, 5, 4, 31};
+        val |=3D val >> 1;
+        val |=3D val >> 2;
+        val |=3D val >> 4;
+        val |=3D val >> 8;
+        val |=3D val >> 16;
+        return 31 - DeBruijnClz[(val * 0x07C4ACDDU) >> 27];
+    }
+}
+
+MEM_STATIC unsigned ZSTD_countLeadingZeros32(U32 val)
+{
+    assert(val !=3D 0);
+#if (__GNUC__ >=3D 4)
+    return (unsigned)__builtin_clz(val);
+#else
+    return ZSTD_countLeadingZeros32_fallback(val);
+#endif
+}
+
+MEM_STATIC unsigned ZSTD_countTrailingZeros64(U64 val)
+{
+    assert(val !=3D 0);
+#if (__GNUC__ >=3D 4) && defined(__LP64__)
+    return (unsigned)__builtin_ctzll(val);
+#else
+    {
+        U32 mostSignificantWord =3D (U32)(val >> 32);
+        U32 leastSignificantWord =3D (U32)val;
+        if (leastSignificantWord =3D=3D 0) {
+            return 32 + ZSTD_countTrailingZeros32(mostSignificantWord);
+        } else {
+            return ZSTD_countTrailingZeros32(leastSignificantWord);
+        }
+    }
+#endif
+}
+
+MEM_STATIC unsigned ZSTD_countLeadingZeros64(U64 val)
+{
+    assert(val !=3D 0);
+#if (__GNUC__ >=3D 4)
+    return (unsigned)(__builtin_clzll(val));
+#else
+    {
+        U32 mostSignificantWord =3D (U32)(val >> 32);
+        U32 leastSignificantWord =3D (U32)val;
+        if (mostSignificantWord =3D=3D 0) {
+            return 32 + ZSTD_countLeadingZeros32(leastSignificantWord);
+        } else {
+            return ZSTD_countLeadingZeros32(mostSignificantWord);
+        }
+    }
+#endif
+}
+
+MEM_STATIC unsigned ZSTD_NbCommonBytes(size_t val)
+{
+    if (MEM_isLittleEndian()) {
+        if (MEM_64bits()) {
+            return ZSTD_countTrailingZeros64((U64)val) >> 3;
+        } else {
+            return ZSTD_countTrailingZeros32((U32)val) >> 3;
+        }
+    } else {  /* Big Endian CPU */
+        if (MEM_64bits()) {
+            return ZSTD_countLeadingZeros64((U64)val) >> 3;
+        } else {
+            return ZSTD_countLeadingZeros32((U32)val) >> 3;
+        }
+    }
+}
+
+MEM_STATIC unsigned ZSTD_highbit32(U32 val)   /* compress, dictBuilder, de=
codeCorpus */
+{
+    assert(val !=3D 0);
+    return 31 - ZSTD_countLeadingZeros32(val);
+}
+
+/* ZSTD_rotateRight_*():
+ * Rotates a bitfield to the right by "count" bits.
+ * https://en.wikipedia.org/w/index.php?title=3DCircular_shift&oldid=3D991=
635599#Implementing_circular_shifts
+ */
+MEM_STATIC
+U64 ZSTD_rotateRight_U64(U64 const value, U32 count) {
+    assert(count < 64);
+    count &=3D 0x3F; /* for fickle pattern recognition */
+    return (value >> count) | (U64)(value << ((0U - count) & 0x3F));
+}
+
+MEM_STATIC
+U32 ZSTD_rotateRight_U32(U32 const value, U32 count) {
+    assert(count < 32);
+    count &=3D 0x1F; /* for fickle pattern recognition */
+    return (value >> count) | (U32)(value << ((0U - count) & 0x1F));
+}
+
+MEM_STATIC
+U16 ZSTD_rotateRight_U16(U16 const value, U32 count) {
+    assert(count < 16);
+    count &=3D 0x0F; /* for fickle pattern recognition */
+    return (value >> count) | (U16)(value << ((0U - count) & 0x0F));
+}
+
+#endif /* ZSTD_BITS_H */
diff --git a/lib/zstd/common/bitstream.h b/lib/zstd/common/bitstream.h
index feef3a1b1d60..86439da0eea7 100644
--- a/lib/zstd/common/bitstream.h
+++ b/lib/zstd/common/bitstream.h
@@ -1,7 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
 /* ******************************************************************
  * bitstream
  * Part of FSE library
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  *
  * You can contact the author at :
  * - Source repository : https://github.com/Cyan4973/FiniteStateEntropy
@@ -27,7 +28,7 @@
 #include "compiler.h"       /* UNLIKELY() */
 #include "debug.h"          /* assert(), DEBUGLOG(), RAWLOG() */
 #include "error_private.h"  /* error codes and messages */
-
+#include "bits.h"           /* ZSTD_highbit32 */
=20
 /*=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
 *  Target specific
@@ -41,12 +42,13 @@
 /*-******************************************
 *  bitStream encoding API (write forward)
 ********************************************/
+typedef size_t BitContainerType;
 /* bitStream can mix input from multiple sources.
  * A critical property of these streams is that they encode and decode in =
**reverse** direction.
  * So the first bit sequence you add will be the last to be read, like a L=
IFO stack.
  */
 typedef struct {
-    size_t bitContainer;
+    BitContainerType bitContainer;
     unsigned bitPos;
     char*  startPtr;
     char*  ptr;
@@ -54,7 +56,7 @@ typedef struct {
 } BIT_CStream_t;
=20
 MEM_STATIC size_t BIT_initCStream(BIT_CStream_t* bitC, void* dstBuffer, si=
ze_t dstCapacity);
-MEM_STATIC void   BIT_addBits(BIT_CStream_t* bitC, size_t value, unsigned =
nbBits);
+MEM_STATIC void   BIT_addBits(BIT_CStream_t* bitC, BitContainerType value,=
 unsigned nbBits);
 MEM_STATIC void   BIT_flushBits(BIT_CStream_t* bitC);
 MEM_STATIC size_t BIT_closeCStream(BIT_CStream_t* bitC);
=20
@@ -63,7 +65,7 @@ MEM_STATIC size_t BIT_closeCStream(BIT_CStream_t* bitC);
 *  `dstCapacity` must be >=3D sizeof(bitD->bitContainer), otherwise @retur=
n will be an error code.
 *
 *  bits are first added to a local register.
-*  Local register is size_t, hence 64-bits on 64-bits systems, or 32-bits =
on 32-bits systems.
+*  Local register is BitContainerType, 64-bits on 64-bits systems, or 32-b=
its on 32-bits systems.
 *  Writing data into memory is an explicit operation, performed by the flu=
shBits function.
 *  Hence keep track how many bits are potentially stored into local regist=
er to avoid register overflow.
 *  After a flushBits, a maximum of 7 bits might still be stored into local=
 register.
@@ -80,28 +82,28 @@ MEM_STATIC size_t BIT_closeCStream(BIT_CStream_t* bitC);
 *  bitStream decoding API (read backward)
 **********************************************/
 typedef struct {
-    size_t   bitContainer;
+    BitContainerType bitContainer;
     unsigned bitsConsumed;
     const char* ptr;
     const char* start;
     const char* limitPtr;
 } BIT_DStream_t;
=20
-typedef enum { BIT_DStream_unfinished =3D 0,
-               BIT_DStream_endOfBuffer =3D 1,
-               BIT_DStream_completed =3D 2,
-               BIT_DStream_overflow =3D 3 } BIT_DStream_status;  /* result=
 of BIT_reloadDStream() */
-               /* 1,2,4,8 would be better for bitmap combinations, but slo=
ws down performance a bit ... :( */
+typedef enum { BIT_DStream_unfinished =3D 0,  /* fully refilled */
+               BIT_DStream_endOfBuffer =3D 1, /* still some bits left in b=
itstream */
+               BIT_DStream_completed =3D 2,   /* bitstream entirely consum=
ed, bit-exact */
+               BIT_DStream_overflow =3D 3     /* user requested more bits =
than present in bitstream */
+    } BIT_DStream_status;  /* result of BIT_reloadDStream() */
=20
 MEM_STATIC size_t   BIT_initDStream(BIT_DStream_t* bitD, const void* srcBu=
ffer, size_t srcSize);
-MEM_STATIC size_t   BIT_readBits(BIT_DStream_t* bitD, unsigned nbBits);
+MEM_STATIC BitContainerType BIT_readBits(BIT_DStream_t* bitD, unsigned nbB=
its);
 MEM_STATIC BIT_DStream_status BIT_reloadDStream(BIT_DStream_t* bitD);
 MEM_STATIC unsigned BIT_endOfDStream(const BIT_DStream_t* bitD);
=20
=20
 /* Start by invoking BIT_initDStream().
 *  A chunk of the bitStream is then stored into a local register.
-*  Local register size is 64-bits on 64-bits systems, 32-bits on 32-bits s=
ystems (size_t).
+*  Local register size is 64-bits on 64-bits systems, 32-bits on 32-bits s=
ystems (BitContainerType).
 *  You can then retrieve bitFields stored into the local register, **in re=
verse order**.
 *  Local register is explicitly reloaded from memory by the BIT_reloadDStr=
eam() method.
 *  A reload guarantee a minimum of ((8*sizeof(bitD->bitContainer))-7) bits=
 when its result is BIT_DStream_unfinished.
@@ -113,7 +115,7 @@ MEM_STATIC unsigned BIT_endOfDStream(const BIT_DStream_=
t* bitD);
 /*-****************************************
 *  unsafe API
 ******************************************/
-MEM_STATIC void BIT_addBitsFast(BIT_CStream_t* bitC, size_t value, unsigne=
d nbBits);
+MEM_STATIC void BIT_addBitsFast(BIT_CStream_t* bitC, BitContainerType valu=
e, unsigned nbBits);
 /* faster, but works only if value is "clean", meaning all high bits above=
 nbBits are 0 */
=20
 MEM_STATIC void BIT_flushBitsFast(BIT_CStream_t* bitC);
@@ -122,33 +124,6 @@ MEM_STATIC void BIT_flushBitsFast(BIT_CStream_t* bitC);
 MEM_STATIC size_t BIT_readBitsFast(BIT_DStream_t* bitD, unsigned nbBits);
 /* faster, but works only if nbBits >=3D 1 */
=20
-
-
-/*-**************************************************************
-*  Internal functions
-****************************************************************/
-MEM_STATIC unsigned BIT_highbit32 (U32 val)
-{
-    assert(val !=3D 0);
-    {
-#   if (__GNUC__ >=3D 3)   /* Use GCC Intrinsic */
-        return __builtin_clz (val) ^ 31;
-#   else   /* Software version */
-        static const unsigned DeBruijnClz[32] =3D { 0,  9,  1, 10, 13, 21,=
  2, 29,
-                                                 11, 14, 16, 18, 22, 25,  =
3, 30,
-                                                  8, 12, 20, 28, 15, 17, 2=
4,  7,
-                                                 19, 27, 23,  6, 26,  5,  =
4, 31 };
-        U32 v =3D val;
-        v |=3D v >> 1;
-        v |=3D v >> 2;
-        v |=3D v >> 4;
-        v |=3D v >> 8;
-        v |=3D v >> 16;
-        return DeBruijnClz[ (U32) (v * 0x07C4ACDDU) >> 27];
-#   endif
-    }
-}
-
 /*=3D=3D=3D=3D=3D    Local Constants   =3D=3D=3D=3D=3D*/
 static const unsigned BIT_mask[] =3D {
     0,          1,         3,         7,         0xF,       0x1F,
@@ -178,16 +153,22 @@ MEM_STATIC size_t BIT_initCStream(BIT_CStream_t* bitC,
     return 0;
 }
=20
+FORCE_INLINE_TEMPLATE BitContainerType BIT_getLowerBits(BitContainerType b=
itContainer, U32 const nbBits)
+{
+    assert(nbBits < BIT_MASK_SIZE);
+    return bitContainer & BIT_mask[nbBits];
+}
+
 /*! BIT_addBits() :
  *  can add up to 31 bits into `bitC`.
  *  Note : does not check for register overflow ! */
 MEM_STATIC void BIT_addBits(BIT_CStream_t* bitC,
-                            size_t value, unsigned nbBits)
+                            BitContainerType value, unsigned nbBits)
 {
     DEBUG_STATIC_ASSERT(BIT_MASK_SIZE =3D=3D 32);
     assert(nbBits < BIT_MASK_SIZE);
     assert(nbBits + bitC->bitPos < sizeof(bitC->bitContainer) * 8);
-    bitC->bitContainer |=3D (value & BIT_mask[nbBits]) << bitC->bitPos;
+    bitC->bitContainer |=3D BIT_getLowerBits(value, nbBits) << bitC->bitPo=
s;
     bitC->bitPos +=3D nbBits;
 }
=20
@@ -195,7 +176,7 @@ MEM_STATIC void BIT_addBits(BIT_CStream_t* bitC,
  *  works only if `value` is _clean_,
  *  meaning all high bits above nbBits are 0 */
 MEM_STATIC void BIT_addBitsFast(BIT_CStream_t* bitC,
-                                size_t value, unsigned nbBits)
+                                BitContainerType value, unsigned nbBits)
 {
     assert((value>>nbBits) =3D=3D 0);
     assert(nbBits + bitC->bitPos < sizeof(bitC->bitContainer) * 8);
@@ -242,7 +223,7 @@ MEM_STATIC size_t BIT_closeCStream(BIT_CStream_t* bitC)
     BIT_addBitsFast(bitC, 1, 1);   /* endMark */
     BIT_flushBits(bitC);
     if (bitC->ptr >=3D bitC->endPtr) return 0; /* overflow detected */
-    return (bitC->ptr - bitC->startPtr) + (bitC->bitPos > 0);
+    return (size_t)(bitC->ptr - bitC->startPtr) + (bitC->bitPos > 0);
 }
=20
=20
@@ -266,35 +247,35 @@ MEM_STATIC size_t BIT_initDStream(BIT_DStream_t* bitD=
, const void* srcBuffer, si
         bitD->ptr   =3D (const char*)srcBuffer + srcSize - sizeof(bitD->bi=
tContainer);
         bitD->bitContainer =3D MEM_readLEST(bitD->ptr);
         { BYTE const lastByte =3D ((const BYTE*)srcBuffer)[srcSize-1];
-          bitD->bitsConsumed =3D lastByte ? 8 - BIT_highbit32(lastByte) : =
0;  /* ensures bitsConsumed is always set */
+          bitD->bitsConsumed =3D lastByte ? 8 - ZSTD_highbit32(lastByte) :=
 0;  /* ensures bitsConsumed is always set */
           if (lastByte =3D=3D 0) return ERROR(GENERIC); /* endMark not pre=
sent */ }
     } else {
         bitD->ptr   =3D bitD->start;
         bitD->bitContainer =3D *(const BYTE*)(bitD->start);
         switch(srcSize)
         {
-        case 7: bitD->bitContainer +=3D (size_t)(((const BYTE*)(srcBuffer)=
)[6]) << (sizeof(bitD->bitContainer)*8 - 16);
+        case 7: bitD->bitContainer +=3D (BitContainerType)(((const BYTE*)(=
srcBuffer))[6]) << (sizeof(bitD->bitContainer)*8 - 16);
                 ZSTD_FALLTHROUGH;
=20
-        case 6: bitD->bitContainer +=3D (size_t)(((const BYTE*)(srcBuffer)=
)[5]) << (sizeof(bitD->bitContainer)*8 - 24);
+        case 6: bitD->bitContainer +=3D (BitContainerType)(((const BYTE*)(=
srcBuffer))[5]) << (sizeof(bitD->bitContainer)*8 - 24);
                 ZSTD_FALLTHROUGH;
=20
-        case 5: bitD->bitContainer +=3D (size_t)(((const BYTE*)(srcBuffer)=
)[4]) << (sizeof(bitD->bitContainer)*8 - 32);
+        case 5: bitD->bitContainer +=3D (BitContainerType)(((const BYTE*)(=
srcBuffer))[4]) << (sizeof(bitD->bitContainer)*8 - 32);
                 ZSTD_FALLTHROUGH;
=20
-        case 4: bitD->bitContainer +=3D (size_t)(((const BYTE*)(srcBuffer)=
)[3]) << 24;
+        case 4: bitD->bitContainer +=3D (BitContainerType)(((const BYTE*)(=
srcBuffer))[3]) << 24;
                 ZSTD_FALLTHROUGH;
=20
-        case 3: bitD->bitContainer +=3D (size_t)(((const BYTE*)(srcBuffer)=
)[2]) << 16;
+        case 3: bitD->bitContainer +=3D (BitContainerType)(((const BYTE*)(=
srcBuffer))[2]) << 16;
                 ZSTD_FALLTHROUGH;
=20
-        case 2: bitD->bitContainer +=3D (size_t)(((const BYTE*)(srcBuffer)=
)[1]) <<  8;
+        case 2: bitD->bitContainer +=3D (BitContainerType)(((const BYTE*)(=
srcBuffer))[1]) <<  8;
                 ZSTD_FALLTHROUGH;
=20
         default: break;
         }
         {   BYTE const lastByte =3D ((const BYTE*)srcBuffer)[srcSize-1];
-            bitD->bitsConsumed =3D lastByte ? 8 - BIT_highbit32(lastByte) =
: 0;
+            bitD->bitsConsumed =3D lastByte ? 8 - ZSTD_highbit32(lastByte)=
 : 0;
             if (lastByte =3D=3D 0) return ERROR(corruption_detected);  /* =
endMark not present */
         }
         bitD->bitsConsumed +=3D (U32)(sizeof(bitD->bitContainer) - srcSize=
)*8;
@@ -303,12 +284,12 @@ MEM_STATIC size_t BIT_initDStream(BIT_DStream_t* bitD=
, const void* srcBuffer, si
     return srcSize;
 }
=20
-MEM_STATIC FORCE_INLINE_ATTR size_t BIT_getUpperBits(size_t bitContainer, =
U32 const start)
+FORCE_INLINE_TEMPLATE BitContainerType BIT_getUpperBits(BitContainerType b=
itContainer, U32 const start)
 {
     return bitContainer >> start;
 }
=20
-MEM_STATIC FORCE_INLINE_ATTR size_t BIT_getMiddleBits(size_t bitContainer,=
 U32 const start, U32 const nbBits)
+FORCE_INLINE_TEMPLATE BitContainerType BIT_getMiddleBits(BitContainerType =
bitContainer, U32 const start, U32 const nbBits)
 {
     U32 const regMask =3D sizeof(bitContainer)*8 - 1;
     /* if start > regMask, bitstream is corrupted, and result is undefined=
 */
@@ -318,26 +299,20 @@ MEM_STATIC FORCE_INLINE_ATTR size_t BIT_getMiddleBits=
(size_t bitContainer, U32 c
      * such cpus old (pre-Haswell, 2013) and their performance is not of t=
hat
      * importance.
      */
-#if defined(__x86_64__) || defined(_M_X86)
+#if defined(__x86_64__) || defined(_M_X64)
     return (bitContainer >> (start & regMask)) & ((((U64)1) << nbBits) - 1=
);
 #else
     return (bitContainer >> (start & regMask)) & BIT_mask[nbBits];
 #endif
 }
=20
-MEM_STATIC FORCE_INLINE_ATTR size_t BIT_getLowerBits(size_t bitContainer, =
U32 const nbBits)
-{
-    assert(nbBits < BIT_MASK_SIZE);
-    return bitContainer & BIT_mask[nbBits];
-}
-
 /*! BIT_lookBits() :
  *  Provides next n bits from local register.
  *  local register is not modified.
  *  On 32-bits, maxNbBits=3D=3D24.
  *  On 64-bits, maxNbBits=3D=3D56.
  * @return : value extracted */
-MEM_STATIC  FORCE_INLINE_ATTR size_t BIT_lookBits(const BIT_DStream_t*  bi=
tD, U32 nbBits)
+FORCE_INLINE_TEMPLATE BitContainerType BIT_lookBits(const BIT_DStream_t*  =
bitD, U32 nbBits)
 {
     /* arbitrate between double-shift and shift+mask */
 #if 1
@@ -353,14 +328,14 @@ MEM_STATIC  FORCE_INLINE_ATTR size_t BIT_lookBits(con=
st BIT_DStream_t*  bitD, U3
=20
 /*! BIT_lookBitsFast() :
  *  unsafe version; only works if nbBits >=3D 1 */
-MEM_STATIC size_t BIT_lookBitsFast(const BIT_DStream_t* bitD, U32 nbBits)
+MEM_STATIC BitContainerType BIT_lookBitsFast(const BIT_DStream_t* bitD, U3=
2 nbBits)
 {
     U32 const regMask =3D sizeof(bitD->bitContainer)*8 - 1;
     assert(nbBits >=3D 1);
     return (bitD->bitContainer << (bitD->bitsConsumed & regMask)) >> (((re=
gMask+1)-nbBits) & regMask);
 }
=20
-MEM_STATIC FORCE_INLINE_ATTR void BIT_skipBits(BIT_DStream_t* bitD, U32 nb=
Bits)
+FORCE_INLINE_TEMPLATE void BIT_skipBits(BIT_DStream_t* bitD, U32 nbBits)
 {
     bitD->bitsConsumed +=3D nbBits;
 }
@@ -369,23 +344,38 @@ MEM_STATIC FORCE_INLINE_ATTR void BIT_skipBits(BIT_DS=
tream_t* bitD, U32 nbBits)
  *  Read (consume) next n bits from local register and update.
  *  Pay attention to not read more than nbBits contained into local regist=
er.
  * @return : extracted value. */
-MEM_STATIC FORCE_INLINE_ATTR size_t BIT_readBits(BIT_DStream_t* bitD, unsi=
gned nbBits)
+FORCE_INLINE_TEMPLATE BitContainerType BIT_readBits(BIT_DStream_t* bitD, u=
nsigned nbBits)
 {
-    size_t const value =3D BIT_lookBits(bitD, nbBits);
+    BitContainerType const value =3D BIT_lookBits(bitD, nbBits);
     BIT_skipBits(bitD, nbBits);
     return value;
 }
=20
 /*! BIT_readBitsFast() :
- *  unsafe version; only works only if nbBits >=3D 1 */
-MEM_STATIC size_t BIT_readBitsFast(BIT_DStream_t* bitD, unsigned nbBits)
+ *  unsafe version; only works if nbBits >=3D 1 */
+MEM_STATIC BitContainerType BIT_readBitsFast(BIT_DStream_t* bitD, unsigned=
 nbBits)
 {
-    size_t const value =3D BIT_lookBitsFast(bitD, nbBits);
+    BitContainerType const value =3D BIT_lookBitsFast(bitD, nbBits);
     assert(nbBits >=3D 1);
     BIT_skipBits(bitD, nbBits);
     return value;
 }
=20
+/*! BIT_reloadDStream_internal() :
+ *  Simple variant of BIT_reloadDStream(), with two conditions:
+ *  1. bitstream is valid : bitsConsumed <=3D sizeof(bitD->bitContainer)*8
+ *  2. look window is valid after shifted down : bitD->ptr >=3D bitD->start
+ */
+MEM_STATIC BIT_DStream_status BIT_reloadDStream_internal(BIT_DStream_t* bi=
tD)
+{
+    assert(bitD->bitsConsumed <=3D sizeof(bitD->bitContainer)*8);
+    bitD->ptr -=3D bitD->bitsConsumed >> 3;
+    assert(bitD->ptr >=3D bitD->start);
+    bitD->bitsConsumed &=3D 7;
+    bitD->bitContainer =3D MEM_readLEST(bitD->ptr);
+    return BIT_DStream_unfinished;
+}
+
 /*! BIT_reloadDStreamFast() :
  *  Similar to BIT_reloadDStream(), but with two differences:
  *  1. bitsConsumed <=3D sizeof(bitD->bitContainer)*8 must hold!
@@ -396,31 +386,35 @@ MEM_STATIC BIT_DStream_status BIT_reloadDStreamFast(B=
IT_DStream_t* bitD)
 {
     if (UNLIKELY(bitD->ptr < bitD->limitPtr))
         return BIT_DStream_overflow;
-    assert(bitD->bitsConsumed <=3D sizeof(bitD->bitContainer)*8);
-    bitD->ptr -=3D bitD->bitsConsumed >> 3;
-    bitD->bitsConsumed &=3D 7;
-    bitD->bitContainer =3D MEM_readLEST(bitD->ptr);
-    return BIT_DStream_unfinished;
+    return BIT_reloadDStream_internal(bitD);
 }
=20
 /*! BIT_reloadDStream() :
  *  Refill `bitD` from buffer previously set in BIT_initDStream() .
- *  This function is safe, it guarantees it will not read beyond src buffe=
r.
+ *  This function is safe, it guarantees it will not never beyond src buff=
er.
  * @return : status of `BIT_DStream_t` internal register.
  *           when status =3D=3D BIT_DStream_unfinished, internal register =
is filled with at least 25 or 57 bits */
-MEM_STATIC BIT_DStream_status BIT_reloadDStream(BIT_DStream_t* bitD)
+FORCE_INLINE_TEMPLATE BIT_DStream_status BIT_reloadDStream(BIT_DStream_t* =
bitD)
 {
-    if (bitD->bitsConsumed > (sizeof(bitD->bitContainer)*8))  /* overflow =
detected, like end of stream */
+    /* note : once in overflow mode, a bitstream remains in this mode unti=
l it's reset */
+    if (UNLIKELY(bitD->bitsConsumed > (sizeof(bitD->bitContainer)*8))) {
+        static const BitContainerType zeroFilled =3D 0;
+        bitD->ptr =3D (const char*)&zeroFilled; /* aliasing is allowed for=
 char */
+        /* overflow detected, erroneous scenario or end of stream: no upda=
te */
         return BIT_DStream_overflow;
+    }
+
+    assert(bitD->ptr >=3D bitD->start);
=20
     if (bitD->ptr >=3D bitD->limitPtr) {
-        return BIT_reloadDStreamFast(bitD);
+        return BIT_reloadDStream_internal(bitD);
     }
     if (bitD->ptr =3D=3D bitD->start) {
+        /* reached end of bitStream =3D> no update */
         if (bitD->bitsConsumed < sizeof(bitD->bitContainer)*8) return BIT_=
DStream_endOfBuffer;
         return BIT_DStream_completed;
     }
-    /* start < ptr < limitPtr */
+    /* start < ptr < limitPtr =3D> cautious update */
     {   U32 nbBytes =3D bitD->bitsConsumed >> 3;
         BIT_DStream_status result =3D BIT_DStream_unfinished;
         if (bitD->ptr - nbBytes < bitD->start) {
@@ -442,5 +436,4 @@ MEM_STATIC unsigned BIT_endOfDStream(const BIT_DStream_=
t* DStream)
     return ((DStream->ptr =3D=3D DStream->start) && (DStream->bitsConsumed=
 =3D=3D sizeof(DStream->bitContainer)*8));
 }
=20
-
 #endif /* BITSTREAM_H_MODULE */
diff --git a/lib/zstd/common/compiler.h b/lib/zstd/common/compiler.h
index c42d39faf9bd..dc9bd15e174e 100644
--- a/lib/zstd/common/compiler.h
+++ b/lib/zstd/common/compiler.h
@@ -1,5 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -11,6 +12,8 @@
 #ifndef ZSTD_COMPILER_H
 #define ZSTD_COMPILER_H
=20
+#include <linux/types.h>
+
 #include "portability_macros.h"
=20
 /*-*******************************************************
@@ -41,12 +44,15 @@
 */
 #define WIN_CDECL
=20
+/* UNUSED_ATTR tells the compiler it is okay if the function is unused. */
+#define UNUSED_ATTR __attribute__((unused))
+
 /*
  * FORCE_INLINE_TEMPLATE is used to define C "templates", which take const=
ant
  * parameters. They must be inlined for the compiler to eliminate the cons=
tant
  * branches.
  */
-#define FORCE_INLINE_TEMPLATE static INLINE_KEYWORD FORCE_INLINE_ATTR
+#define FORCE_INLINE_TEMPLATE static INLINE_KEYWORD FORCE_INLINE_ATTR UNUS=
ED_ATTR
 /*
  * HINT_INLINE is used to help the compiler generate better code. It is *n=
ot*
  * used for "templates", so it can be tweaked based on the compilers
@@ -61,11 +67,21 @@
 #if !defined(__clang__) && defined(__GNUC__) && __GNUC__ >=3D 4 && __GNUC_=
MINOR__ >=3D 8 && __GNUC__ < 5
 #  define HINT_INLINE static INLINE_KEYWORD
 #else
-#  define HINT_INLINE static INLINE_KEYWORD FORCE_INLINE_ATTR
+#  define HINT_INLINE FORCE_INLINE_TEMPLATE
 #endif
=20
-/* UNUSED_ATTR tells the compiler it is okay if the function is unused. */
-#define UNUSED_ATTR __attribute__((unused))
+/* "soft" inline :
+ * The compiler is free to select if it's a good idea to inline or not.
+ * The main objective is to silence compiler warnings
+ * when a defined function in included but not used.
+ *
+ * Note : this macro is prefixed `MEM_` because it used to be provided by =
`mem.h` unit.
+ * Updating the prefix is probably preferable, but requires a fairly large=
 codemod,
+ * since this name is used everywhere.
+ */
+#ifndef MEM_STATIC  /* already defined in Linux Kernel mem.h */
+#define MEM_STATIC static __inline UNUSED_ATTR
+#endif
=20
 /* force no inlining */
 #define FORCE_NOINLINE static __attribute__((__noinline__))
@@ -86,23 +102,24 @@
 #  define PREFETCH_L1(ptr)  __builtin_prefetch((ptr), 0 /* rw=3D=3Dread */=
, 3 /* locality */)
 #  define PREFETCH_L2(ptr)  __builtin_prefetch((ptr), 0 /* rw=3D=3Dread */=
, 2 /* locality */)
 #elif defined(__aarch64__)
-#  define PREFETCH_L1(ptr)  __asm__ __volatile__("prfm pldl1keep, %0" ::"Q=
"(*(ptr)))
-#  define PREFETCH_L2(ptr)  __asm__ __volatile__("prfm pldl2keep, %0" ::"Q=
"(*(ptr)))
+#  define PREFETCH_L1(ptr)  do { __asm__ __volatile__("prfm pldl1keep, %0"=
 ::"Q"(*(ptr))); } while (0)
+#  define PREFETCH_L2(ptr)  do { __asm__ __volatile__("prfm pldl2keep, %0"=
 ::"Q"(*(ptr))); } while (0)
 #else
-#  define PREFETCH_L1(ptr) (void)(ptr)  /* disabled */
-#  define PREFETCH_L2(ptr) (void)(ptr)  /* disabled */
+#  define PREFETCH_L1(ptr) do { (void)(ptr); } while (0)  /* disabled */
+#  define PREFETCH_L2(ptr) do { (void)(ptr); } while (0)  /* disabled */
 #endif  /* NO_PREFETCH */
=20
 #define CACHELINE_SIZE 64
=20
-#define PREFETCH_AREA(p, s)  {            \
-    const char* const _ptr =3D (const char*)(p);  \
-    size_t const _size =3D (size_t)(s);     \
-    size_t _pos;                          \
-    for (_pos=3D0; _pos<_size; _pos+=3DCACHELINE_SIZE) {  \
-        PREFETCH_L2(_ptr + _pos);         \
-    }                                     \
-}
+#define PREFETCH_AREA(p, s)                              \
+    do {                                                 \
+        const char* const _ptr =3D (const char*)(p);       \
+        size_t const _size =3D (size_t)(s);                \
+        size_t _pos;                                     \
+        for (_pos=3D0; _pos<_size; _pos+=3DCACHELINE_SIZE) { \
+            PREFETCH_L2(_ptr + _pos);                    \
+        }                                                \
+    } while (0)
=20
 /* vectorization
  * older GCC (pre gcc-4.3 picked as the cutoff) uses a different syntax,
@@ -126,16 +143,13 @@
 #define UNLIKELY(x) (__builtin_expect((x), 0))
=20
 #if __has_builtin(__builtin_unreachable) || (defined(__GNUC__) && (__GNUC_=
_ > 4 || (__GNUC__ =3D=3D 4 && __GNUC_MINOR__ >=3D 5)))
-#  define ZSTD_UNREACHABLE { assert(0), __builtin_unreachable(); }
+#  define ZSTD_UNREACHABLE do { assert(0), __builtin_unreachable(); } whil=
e (0)
 #else
-#  define ZSTD_UNREACHABLE { assert(0); }
+#  define ZSTD_UNREACHABLE do { assert(0); } while (0)
 #endif
=20
 /* disable warnings */
=20
-/*Like DYNAMIC_BMI2 but for compile time determination of BMI2 support*/
-
-
 /* compile time determination of SIMD support */
=20
 /* C-language Attributes are added in C23. */
@@ -158,9 +172,15 @@
 #define ZSTD_FALLTHROUGH fallthrough
=20
 /*-**************************************************************
-*  Alignment check
+*  Alignment
 *****************************************************************/
=20
+/* @return 1 if @u is a 2^n value, 0 otherwise
+ * useful to check a value is valid for alignment restrictions */
+MEM_STATIC int ZSTD_isPower2(size_t u) {
+    return (u & (u-1)) =3D=3D 0;
+}
+
 /* this test was initially positioned in mem.h,
  * but this file is removed (or replaced) for linux kernel
  * so it's now hosted in compiler.h,
@@ -175,10 +195,95 @@
=20
 #endif /* ZSTD_ALIGNOF */
=20
+#ifndef ZSTD_ALIGNED
+/* C90-compatible alignment macro (GCC/Clang). Adjust for other compilers =
if needed. */
+#define ZSTD_ALIGNED(a) __attribute__((aligned(a)))
+#endif /* ZSTD_ALIGNED */
+
+
 /*-**************************************************************
 *  Sanitizer
 *****************************************************************/
=20
+/*
+ * Zstd relies on pointer overflow in its decompressor.
+ * We add this attribute to functions that rely on pointer overflow.
+ */
+#ifndef ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
+#  if __has_attribute(no_sanitize)
+#    if !defined(__clang__) && defined(__GNUC__) && __GNUC__ < 8
+       /* gcc < 8 only has signed-integer-overlow which triggers on pointe=
r overflow */
+#      define ZSTD_ALLOW_POINTER_OVERFLOW_ATTR __attribute__((no_sanitize(=
"signed-integer-overflow")))
+#    else
+       /* older versions of clang [3.7, 5.0) will warn that pointer-overfl=
ow is ignored. */
+#      define ZSTD_ALLOW_POINTER_OVERFLOW_ATTR __attribute__((no_sanitize(=
"pointer-overflow")))
+#    endif
+#  else
+#    define ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
+#  endif
+#endif
+
+/*
+ * Helper function to perform a wrapped pointer difference without trigger=
ing
+ * UBSAN.
+ *
+ * @returns lhs - rhs with wrapping
+ */
+MEM_STATIC
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
+ptrdiff_t ZSTD_wrappedPtrDiff(unsigned char const* lhs, unsigned char cons=
t* rhs)
+{
+    return lhs - rhs;
+}
+
+/*
+ * Helper function to perform a wrapped pointer add without triggering UBS=
AN.
+ *
+ * @return ptr + add with wrapping
+ */
+MEM_STATIC
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
+unsigned char const* ZSTD_wrappedPtrAdd(unsigned char const* ptr, ptrdiff_=
t add)
+{
+    return ptr + add;
+}
+
+/*
+ * Helper function to perform a wrapped pointer subtraction without trigge=
ring
+ * UBSAN.
+ *
+ * @return ptr - sub with wrapping
+ */
+MEM_STATIC
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
+unsigned char const* ZSTD_wrappedPtrSub(unsigned char const* ptr, ptrdiff_=
t sub)
+{
+    return ptr - sub;
+}
+
+/*
+ * Helper function to add to a pointer that works around C's undefined beh=
avior
+ * of adding 0 to NULL.
+ *
+ * @returns `ptr + add` except it defines `NULL + 0 =3D=3D NULL`.
+ */
+MEM_STATIC
+unsigned char* ZSTD_maybeNullPtrAdd(unsigned char* ptr, ptrdiff_t add)
+{
+    return add > 0 ? ptr + add : ptr;
+}
+
+/* Issue #3240 reports an ASAN failure on an llvm-mingw build. Out of an
+ * abundance of caution, disable our custom poisoning on mingw. */
+#ifdef __MINGW32__
+#ifndef ZSTD_ASAN_DONT_POISON_WORKSPACE
+#define ZSTD_ASAN_DONT_POISON_WORKSPACE 1
+#endif
+#ifndef ZSTD_MSAN_DONT_POISON_WORKSPACE
+#define ZSTD_MSAN_DONT_POISON_WORKSPACE 1
+#endif
+#endif
+
=20
=20
 #endif /* ZSTD_COMPILER_H */
diff --git a/lib/zstd/common/cpu.h b/lib/zstd/common/cpu.h
index 0db7b42407ee..d8319a2bef4c 100644
--- a/lib/zstd/common/cpu.h
+++ b/lib/zstd/common/cpu.h
@@ -1,5 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
 /*
- * Copyright (c) Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
diff --git a/lib/zstd/common/debug.c b/lib/zstd/common/debug.c
index bb863c9ea616..8eb6aa9a3b20 100644
--- a/lib/zstd/common/debug.c
+++ b/lib/zstd/common/debug.c
@@ -1,7 +1,8 @@
+// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
 /* ******************************************************************
  * debug
  * Part of FSE library
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  *
  * You can contact the author at :
  * - Source repository : https://github.com/Cyan4973/FiniteStateEntropy
@@ -21,4 +22,10 @@
=20
 #include "debug.h"
=20
+#if (DEBUGLEVEL>=3D2)
+/* We only use this when DEBUGLEVEL>=3D2, but we get -Werror=3Dpedantic er=
rors if a
+ * translation unit is empty. So remove this from Linux kernel builds, but
+ * otherwise just leave it in.
+ */
 int g_debuglevel =3D DEBUGLEVEL;
+#endif
diff --git a/lib/zstd/common/debug.h b/lib/zstd/common/debug.h
index 6dd88d1fbd02..c8a10281f112 100644
--- a/lib/zstd/common/debug.h
+++ b/lib/zstd/common/debug.h
@@ -1,7 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
 /* ******************************************************************
  * debug
  * Part of FSE library
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  *
  * You can contact the author at :
  * - Source repository : https://github.com/Cyan4973/FiniteStateEntropy
@@ -33,7 +34,6 @@
 #define DEBUG_H_12987983217
=20
=20
-
 /* static assert is triggered at compile time, leaving no runtime artefact.
  * static assert only works with compile-time constants.
  * Also, this variant can only be used inside a function. */
@@ -82,20 +82,27 @@ extern int g_debuglevel; /* the variable is only declar=
ed,
                             It's useful when enabling very verbose levels
                             on selective conditions (such as position in s=
rc) */
=20
-#  define RAWLOG(l, ...) {                                       \
-                if (l<=3Dg_debuglevel) {                           \
-                    ZSTD_DEBUG_PRINT(__VA_ARGS__);               \
-            }   }
-#  define DEBUGLOG(l, ...) {                                     \
-                if (l<=3Dg_debuglevel) {                           \
-                    ZSTD_DEBUG_PRINT(__FILE__ ": " __VA_ARGS__); \
-                    ZSTD_DEBUG_PRINT(" \n");                     \
-            }   }
+#  define RAWLOG(l, ...)                   \
+    do {                                   \
+        if (l<=3Dg_debuglevel) {             \
+            ZSTD_DEBUG_PRINT(__VA_ARGS__); \
+        }                                  \
+    } while (0)
+
+#define STRINGIFY(x) #x
+#define TOSTRING(x) STRINGIFY(x)
+#define LINE_AS_STRING TOSTRING(__LINE__)
+
+#  define DEBUGLOG(l, ...)                               \
+    do {                                                 \
+        if (l<=3Dg_debuglevel) {                           \
+            ZSTD_DEBUG_PRINT(__FILE__ ":" LINE_AS_STRING ": " __VA_ARGS__)=
; \
+            ZSTD_DEBUG_PRINT(" \n");                     \
+        }                                                \
+    } while (0)
 #else
-#  define RAWLOG(l, ...)      {}    /* disabled */
-#  define DEBUGLOG(l, ...)    {}    /* disabled */
+#  define RAWLOG(l, ...)   do { } while (0)    /* disabled */
+#  define DEBUGLOG(l, ...) do { } while (0)    /* disabled */
 #endif
=20
-
-
 #endif /* DEBUG_H_12987983217 */
diff --git a/lib/zstd/common/entropy_common.c b/lib/zstd/common/entropy_com=
mon.c
index fef67056f052..6cdd82233fb5 100644
--- a/lib/zstd/common/entropy_common.c
+++ b/lib/zstd/common/entropy_common.c
@@ -1,6 +1,7 @@
+// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
 /* ******************************************************************
  * Common functions of New Generation Entropy library
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  *
  *  You can contact the author at :
  *  - FSE+HUF source repository : https://github.com/Cyan4973/FiniteStateE=
ntropy
@@ -19,8 +20,8 @@
 #include "error_private.h"       /* ERR_*, ERROR */
 #define FSE_STATIC_LINKING_ONLY  /* FSE_MIN_TABLELOG */
 #include "fse.h"
-#define HUF_STATIC_LINKING_ONLY  /* HUF_TABLELOG_ABSOLUTEMAX */
 #include "huf.h"
+#include "bits.h"                /* ZSDT_highbit32, ZSTD_countTrailingZero=
s32 */
=20
=20
 /*=3D=3D=3D   Version   =3D=3D=3D*/
@@ -38,23 +39,6 @@ const char* HUF_getErrorName(size_t code) { return ERR_g=
etErrorName(code); }
 /*-**************************************************************
 *  FSE NCount encoding-decoding
 ****************************************************************/
-static U32 FSE_ctz(U32 val)
-{
-    assert(val !=3D 0);
-    {
-#   if (__GNUC__ >=3D 3)   /* GCC Intrinsic */
-        return __builtin_ctz(val);
-#   else   /* Software version */
-        U32 count =3D 0;
-        while ((val & 1) =3D=3D 0) {
-            val >>=3D 1;
-            ++count;
-        }
-        return count;
-#   endif
-    }
-}
-
 FORCE_INLINE_TEMPLATE
 size_t FSE_readNCount_body(short* normalizedCounter, unsigned* maxSVPtr, u=
nsigned* tableLogPtr,
                            const void* headerBuffer, size_t hbSize)
@@ -102,7 +86,7 @@ size_t FSE_readNCount_body(short* normalizedCounter, uns=
igned* maxSVPtr, unsigne
              * repeat.
              * Avoid UB by setting the high bit to 1.
              */
-            int repeats =3D FSE_ctz(~bitStream | 0x80000000) >> 1;
+            int repeats =3D ZSTD_countTrailingZeros32(~bitStream | 0x80000=
000) >> 1;
             while (repeats >=3D 12) {
                 charnum +=3D 3 * 12;
                 if (LIKELY(ip <=3D iend-7)) {
@@ -113,7 +97,7 @@ size_t FSE_readNCount_body(short* normalizedCounter, uns=
igned* maxSVPtr, unsigne
                     ip =3D iend - 4;
                 }
                 bitStream =3D MEM_readLE32(ip) >> bitCount;
-                repeats =3D FSE_ctz(~bitStream | 0x80000000) >> 1;
+                repeats =3D ZSTD_countTrailingZeros32(~bitStream | 0x80000=
000) >> 1;
             }
             charnum +=3D 3 * repeats;
             bitStream >>=3D 2 * repeats;
@@ -178,7 +162,7 @@ size_t FSE_readNCount_body(short* normalizedCounter, un=
signed* maxSVPtr, unsigne
                  * know that threshold > 1.
                  */
                 if (remaining <=3D 1) break;
-                nbBits =3D BIT_highbit32(remaining) + 1;
+                nbBits =3D ZSTD_highbit32(remaining) + 1;
                 threshold =3D 1 << (nbBits - 1);
             }
             if (charnum >=3D maxSV1) break;
@@ -253,7 +237,7 @@ size_t HUF_readStats(BYTE* huffWeight, size_t hwSize, U=
32* rankStats,
                      const void* src, size_t srcSize)
 {
     U32 wksp[HUF_READ_STATS_WORKSPACE_SIZE_U32];
-    return HUF_readStats_wksp(huffWeight, hwSize, rankStats, nbSymbolsPtr,=
 tableLogPtr, src, srcSize, wksp, sizeof(wksp), /* bmi2 */ 0);
+    return HUF_readStats_wksp(huffWeight, hwSize, rankStats, nbSymbolsPtr,=
 tableLogPtr, src, srcSize, wksp, sizeof(wksp), /* flags */ 0);
 }
=20
 FORCE_INLINE_TEMPLATE size_t
@@ -301,14 +285,14 @@ HUF_readStats_body(BYTE* huffWeight, size_t hwSize, U=
32* rankStats,
     if (weightTotal =3D=3D 0) return ERROR(corruption_detected);
=20
     /* get last non-null symbol weight (implied, total must be 2^n) */
-    {   U32 const tableLog =3D BIT_highbit32(weightTotal) + 1;
+    {   U32 const tableLog =3D ZSTD_highbit32(weightTotal) + 1;
         if (tableLog > HUF_TABLELOG_MAX) return ERROR(corruption_detected);
         *tableLogPtr =3D tableLog;
         /* determine last weight */
         {   U32 const total =3D 1 << tableLog;
             U32 const rest =3D total - weightTotal;
-            U32 const verif =3D 1 << BIT_highbit32(rest);
-            U32 const lastWeight =3D BIT_highbit32(rest) + 1;
+            U32 const verif =3D 1 << ZSTD_highbit32(rest);
+            U32 const lastWeight =3D ZSTD_highbit32(rest) + 1;
             if (verif !=3D rest) return ERROR(corruption_detected);    /* =
last value must be a clean power of 2 */
             huffWeight[oSize] =3D (BYTE)lastWeight;
             rankStats[lastWeight]++;
@@ -345,13 +329,13 @@ size_t HUF_readStats_wksp(BYTE* huffWeight, size_t hw=
Size, U32* rankStats,
                      U32* nbSymbolsPtr, U32* tableLogPtr,
                      const void* src, size_t srcSize,
                      void* workSpace, size_t wkspSize,
-                     int bmi2)
+                     int flags)
 {
 #if DYNAMIC_BMI2
-    if (bmi2) {
+    if (flags & HUF_flags_bmi2) {
         return HUF_readStats_body_bmi2(huffWeight, hwSize, rankStats, nbSy=
mbolsPtr, tableLogPtr, src, srcSize, workSpace, wkspSize);
     }
 #endif
-    (void)bmi2;
+    (void)flags;
     return HUF_readStats_body_default(huffWeight, hwSize, rankStats, nbSym=
bolsPtr, tableLogPtr, src, srcSize, workSpace, wkspSize);
 }
diff --git a/lib/zstd/common/error_private.c b/lib/zstd/common/error_privat=
e.c
index 6d1135f8c373..6c3dbad838b6 100644
--- a/lib/zstd/common/error_private.c
+++ b/lib/zstd/common/error_private.c
@@ -1,5 +1,6 @@
+// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -27,9 +28,11 @@ const char* ERR_getErrorString(ERR_enum code)
     case PREFIX(version_unsupported): return "Version not supported";
     case PREFIX(frameParameter_unsupported): return "Unsupported frame par=
ameter";
     case PREFIX(frameParameter_windowTooLarge): return "Frame requires too=
 much memory for decoding";
-    case PREFIX(corruption_detected): return "Corrupted block detected";
+    case PREFIX(corruption_detected): return "Data corruption detected";
     case PREFIX(checksum_wrong): return "Restored data doesn't match check=
sum";
+    case PREFIX(literals_headerWrong): return "Header of Literals' block d=
oesn't respect format specification";
     case PREFIX(parameter_unsupported): return "Unsupported parameter";
+    case PREFIX(parameter_combination_unsupported): return "Unsupported co=
mbination of parameters";
     case PREFIX(parameter_outOfBound): return "Parameter is out of bound";
     case PREFIX(init_missing): return "Context should be init first";
     case PREFIX(memory_allocation): return "Allocation error : not enough =
memory";
@@ -38,17 +41,23 @@ const char* ERR_getErrorString(ERR_enum code)
     case PREFIX(tableLog_tooLarge): return "tableLog requires too much mem=
ory : unsupported";
     case PREFIX(maxSymbolValue_tooLarge): return "Unsupported max Symbol V=
alue : too large";
     case PREFIX(maxSymbolValue_tooSmall): return "Specified maxSymbolValue=
 is too small";
+    case PREFIX(cannotProduce_uncompressedBlock): return "This mode cannot=
 generate an uncompressed block";
+    case PREFIX(stabilityCondition_notRespected): return "pledged buffer s=
tability condition is not respected";
     case PREFIX(dictionary_corrupted): return "Dictionary is corrupted";
     case PREFIX(dictionary_wrong): return "Dictionary mismatch";
     case PREFIX(dictionaryCreation_failed): return "Cannot create Dictiona=
ry from provided samples";
     case PREFIX(dstSize_tooSmall): return "Destination buffer is too small=
";
     case PREFIX(srcSize_wrong): return "Src size is incorrect";
     case PREFIX(dstBuffer_null): return "Operation on NULL destination buf=
fer";
+    case PREFIX(noForwardProgress_destFull): return "Operation made no pro=
gress over multiple calls, due to output buffer being full";
+    case PREFIX(noForwardProgress_inputEmpty): return "Operation made no p=
rogress over multiple calls, due to input being empty";
         /* following error codes are not stable and may be removed or chan=
ged in a future version */
     case PREFIX(frameIndex_tooLarge): return "Frame index is too large";
     case PREFIX(seekableIO): return "An I/O error occurred when reading/se=
eking";
     case PREFIX(dstBuffer_wrong): return "Destination buffer is wrong";
     case PREFIX(srcBuffer_wrong): return "Source buffer is wrong";
+    case PREFIX(sequenceProducer_failed): return "Block-level external seq=
uence producer returned an error code";
+    case PREFIX(externalSequences_invalid): return "External sequences are=
 not valid";
     case PREFIX(maxCode):
     default: return notErrorCode;
     }
diff --git a/lib/zstd/common/error_private.h b/lib/zstd/common/error_privat=
e.h
index ca5101e542fa..08ee87b68cca 100644
--- a/lib/zstd/common/error_private.h
+++ b/lib/zstd/common/error_private.h
@@ -1,5 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -13,8 +14,6 @@
 #ifndef ERROR_H_MODULE
 #define ERROR_H_MODULE
=20
-
-
 /* ****************************************
 *  Dependencies
 ******************************************/
@@ -23,7 +22,6 @@
 #include "debug.h"
 #include "zstd_deps.h"       /* size_t */
=20
-
 /* ****************************************
 *  Compiler-specific
 ******************************************/
@@ -49,8 +47,13 @@ ERR_STATIC unsigned ERR_isError(size_t code) { return (c=
ode > ERROR(maxCode)); }
 ERR_STATIC ERR_enum ERR_getErrorCode(size_t code) { if (!ERR_isError(code)=
) return (ERR_enum)0; return (ERR_enum) (0-code); }
=20
 /* check and forward error code */
-#define CHECK_V_F(e, f) size_t const e =3D f; if (ERR_isError(e)) return e
-#define CHECK_F(f)   { CHECK_V_F(_var_err__, f); }
+#define CHECK_V_F(e, f)     \
+    size_t const e =3D f;     \
+    do {                    \
+        if (ERR_isError(e)) \
+            return e;       \
+    } while (0)
+#define CHECK_F(f)   do { CHECK_V_F(_var_err__, f); } while (0)
=20
=20
 /*-****************************************
@@ -84,10 +87,12 @@ void _force_has_format_string(const char *format, ...) {
  * We want to force this function invocation to be syntactically correct, =
but
  * we don't want to force runtime evaluation of its arguments.
  */
-#define _FORCE_HAS_FORMAT_STRING(...) \
-  if (0) { \
-    _force_has_format_string(__VA_ARGS__); \
-  }
+#define _FORCE_HAS_FORMAT_STRING(...)              \
+    do {                                           \
+        if (0) {                                   \
+            _force_has_format_string(__VA_ARGS__); \
+        }                                          \
+    } while (0)
=20
 #define ERR_QUOTE(str) #str
=20
@@ -98,48 +103,49 @@ void _force_has_format_string(const char *format, ...)=
 {
  * In order to do that (particularly, printing the conditional that failed=
),
  * this can't just wrap RETURN_ERROR().
  */
-#define RETURN_ERROR_IF(cond, err, ...) \
-  if (cond) { \
-    RAWLOG(3, "%s:%d: ERROR!: check %s failed, returning %s", \
-           __FILE__, __LINE__, ERR_QUOTE(cond), ERR_QUOTE(ERROR(err))); \
-    _FORCE_HAS_FORMAT_STRING(__VA_ARGS__); \
-    RAWLOG(3, ": " __VA_ARGS__); \
-    RAWLOG(3, "\n"); \
-    return ERROR(err); \
-  }
+#define RETURN_ERROR_IF(cond, err, ...)                                   =
     \
+    do {                                                                  =
     \
+        if (cond) {                                                       =
     \
+            RAWLOG(3, "%s:%d: ERROR!: check %s failed, returning %s",     =
     \
+                  __FILE__, __LINE__, ERR_QUOTE(cond), ERR_QUOTE(ERROR(err=
))); \
+            _FORCE_HAS_FORMAT_STRING(__VA_ARGS__);                        =
     \
+            RAWLOG(3, ": " __VA_ARGS__);                                  =
     \
+            RAWLOG(3, "\n");                                              =
     \
+            return ERROR(err);                                            =
     \
+        }                                                                 =
     \
+    } while (0)
=20
 /*
  * Unconditionally return the specified error.
  *
  * In debug modes, prints additional information.
  */
-#define RETURN_ERROR(err, ...) \
-  do { \
-    RAWLOG(3, "%s:%d: ERROR!: unconditional check failed, returning %s", \
-           __FILE__, __LINE__, ERR_QUOTE(ERROR(err))); \
-    _FORCE_HAS_FORMAT_STRING(__VA_ARGS__); \
-    RAWLOG(3, ": " __VA_ARGS__); \
-    RAWLOG(3, "\n"); \
-    return ERROR(err); \
-  } while(0);
+#define RETURN_ERROR(err, ...)                                            =
   \
+    do {                                                                  =
   \
+        RAWLOG(3, "%s:%d: ERROR!: unconditional check failed, returning %s=
", \
+              __FILE__, __LINE__, ERR_QUOTE(ERROR(err)));                 =
   \
+        _FORCE_HAS_FORMAT_STRING(__VA_ARGS__);                            =
   \
+        RAWLOG(3, ": " __VA_ARGS__);                                      =
   \
+        RAWLOG(3, "\n");                                                  =
   \
+        return ERROR(err);                                                =
   \
+    } while(0)
=20
 /*
  * If the provided expression evaluates to an error code, returns that err=
or code.
  *
  * In debug modes, prints additional information.
  */
-#define FORWARD_IF_ERROR(err, ...) \
-  do { \
-    size_t const err_code =3D (err); \
-    if (ERR_isError(err_code)) { \
-      RAWLOG(3, "%s:%d: ERROR!: forwarding error in %s: %s", \
-             __FILE__, __LINE__, ERR_QUOTE(err), ERR_getErrorName(err_code=
)); \
-      _FORCE_HAS_FORMAT_STRING(__VA_ARGS__); \
-      RAWLOG(3, ": " __VA_ARGS__); \
-      RAWLOG(3, "\n"); \
-      return err_code; \
-    } \
-  } while(0);
-
+#define FORWARD_IF_ERROR(err, ...)                                        =
         \
+    do {                                                                  =
         \
+        size_t const err_code =3D (err);                                  =
           \
+        if (ERR_isError(err_code)) {                                      =
         \
+            RAWLOG(3, "%s:%d: ERROR!: forwarding error in %s: %s",        =
         \
+                  __FILE__, __LINE__, ERR_QUOTE(err), ERR_getErrorName(err=
_code)); \
+            _FORCE_HAS_FORMAT_STRING(__VA_ARGS__);                        =
         \
+            RAWLOG(3, ": " __VA_ARGS__);                                  =
         \
+            RAWLOG(3, "\n");                                              =
         \
+            return err_code;                                              =
         \
+        }                                                                 =
         \
+    } while(0)
=20
 #endif /* ERROR_H_MODULE */
diff --git a/lib/zstd/common/fse.h b/lib/zstd/common/fse.h
index 4507043b2287..b36ce7a2a8c3 100644
--- a/lib/zstd/common/fse.h
+++ b/lib/zstd/common/fse.h
@@ -1,7 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
 /* ******************************************************************
  * FSE : Finite State Entropy codec
  * Public Prototypes declaration
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  *
  * You can contact the author at :
  * - Source repository : https://github.com/Cyan4973/FiniteStateEntropy
@@ -11,8 +12,6 @@
  * in the COPYING file in the root directory of this source tree).
  * You may select, at your option, one of the above-listed licenses.
 ****************************************************************** */
-
-
 #ifndef FSE_H
 #define FSE_H
=20
@@ -22,7 +21,6 @@
 ******************************************/
 #include "zstd_deps.h"    /* size_t, ptrdiff_t */
=20
-
 /*-*****************************************
 *  FSE_PUBLIC_API : control library symbols visibility
 ******************************************/
@@ -50,34 +48,6 @@
 FSE_PUBLIC_API unsigned FSE_versionNumber(void);   /*< library version num=
ber; to be used when checking dll version */
=20
=20
-/*-****************************************
-*  FSE simple functions
-******************************************/
-/*! FSE_compress() :
-    Compress content of buffer 'src', of size 'srcSize', into destination =
buffer 'dst'.
-    'dst' buffer must be already allocated. Compression runs faster is dst=
Capacity >=3D FSE_compressBound(srcSize).
-    @return : size of compressed data (<=3D dstCapacity).
-    Special values : if return =3D=3D 0, srcData is not compressible =3D> =
Nothing is stored within dst !!!
-                     if return =3D=3D 1, srcData is a single byte symbol *=
 srcSize times. Use RLE compression instead.
-                     if FSE_isError(return), compression failed (more deta=
ils using FSE_getErrorName())
-*/
-FSE_PUBLIC_API size_t FSE_compress(void* dst, size_t dstCapacity,
-                             const void* src, size_t srcSize);
-
-/*! FSE_decompress():
-    Decompress FSE data from buffer 'cSrc', of size 'cSrcSize',
-    into already allocated destination buffer 'dst', of size 'dstCapacity'.
-    @return : size of regenerated data (<=3D maxDstSize),
-              or an error code, which can be tested using FSE_isError() .
-
-    ** Important ** : FSE_decompress() does not decompress non-compressibl=
e nor RLE data !!!
-    Why ? : making this distinction requires a header.
-    Header management is intentionally delegated to the user layer, which =
can better manage special cases.
-*/
-FSE_PUBLIC_API size_t FSE_decompress(void* dst,  size_t dstCapacity,
-                               const void* cSrc, size_t cSrcSize);
-
-
 /*-*****************************************
 *  Tool functions
 ******************************************/
@@ -88,20 +58,6 @@ FSE_PUBLIC_API unsigned    FSE_isError(size_t code);    =
    /* tells if a return
 FSE_PUBLIC_API const char* FSE_getErrorName(size_t code);   /* provides er=
ror code string (useful for debugging) */
=20
=20
-/*-*****************************************
-*  FSE advanced functions
-******************************************/
-/*! FSE_compress2() :
-    Same as FSE_compress(), but allows the selection of 'maxSymbolValue' a=
nd 'tableLog'
-    Both parameters can be defined as '0' to mean : use default value
-    @return : size of compressed data
-    Special values : if return =3D=3D 0, srcData is not compressible =3D> =
Nothing is stored within cSrc !!!
-                     if return =3D=3D 1, srcData is a single byte symbol *=
 srcSize times. Use RLE compression.
-                     if FSE_isError(return), it's an error code.
-*/
-FSE_PUBLIC_API size_t FSE_compress2 (void* dst, size_t dstSize, const void=
* src, size_t srcSize, unsigned maxSymbolValue, unsigned tableLog);
-
-
 /*-*****************************************
 *  FSE detailed API
 ******************************************/
@@ -161,8 +117,6 @@ FSE_PUBLIC_API size_t FSE_writeNCount (void* buffer, si=
ze_t bufferSize,
 /*! Constructor and Destructor of FSE_CTable.
     Note that FSE_CTable size depends on 'tableLog' and 'maxSymbolValue' */
 typedef unsigned FSE_CTable;   /* don't allocate that. It's only meant to =
be more restrictive than void* */
-FSE_PUBLIC_API FSE_CTable* FSE_createCTable (unsigned maxSymbolValue, unsi=
gned tableLog);
-FSE_PUBLIC_API void        FSE_freeCTable (FSE_CTable* ct);
=20
 /*! FSE_buildCTable():
     Builds `ct`, which must be already allocated, using FSE_createCTable().
@@ -238,23 +192,7 @@ FSE_PUBLIC_API size_t FSE_readNCount_bmi2(short* norma=
lizedCounter,
                            unsigned* maxSymbolValuePtr, unsigned* tableLog=
Ptr,
                            const void* rBuffer, size_t rBuffSize, int bmi2=
);
=20
-/*! Constructor and Destructor of FSE_DTable.
-    Note that its size depends on 'tableLog' */
 typedef unsigned FSE_DTable;   /* don't allocate that. It's just a way to =
be more restrictive than void* */
-FSE_PUBLIC_API FSE_DTable* FSE_createDTable(unsigned tableLog);
-FSE_PUBLIC_API void        FSE_freeDTable(FSE_DTable* dt);
-
-/*! FSE_buildDTable():
-    Builds 'dt', which must be already allocated, using FSE_createDTable().
-    return : 0, or an errorCode, which can be tested using FSE_isError() */
-FSE_PUBLIC_API size_t FSE_buildDTable (FSE_DTable* dt, const short* normal=
izedCounter, unsigned maxSymbolValue, unsigned tableLog);
-
-/*! FSE_decompress_usingDTable():
-    Decompress compressed source `cSrc` of size `cSrcSize` using `dt`
-    into `dst` which must be already allocated.
-    @return : size of regenerated data (necessarily <=3D `dstCapacity`),
-              or an errorCode, which can be tested using FSE_isError() */
-FSE_PUBLIC_API size_t FSE_decompress_usingDTable(void* dst, size_t dstCapa=
city, const void* cSrc, size_t cSrcSize, const FSE_DTable* dt);
=20
 /*!
 Tutorial :
@@ -286,13 +224,11 @@ If there is an error, the function will return an err=
or code, which can be teste
=20
 #endif  /* FSE_H */
=20
+
 #if !defined(FSE_H_FSE_STATIC_LINKING_ONLY)
 #define FSE_H_FSE_STATIC_LINKING_ONLY
-
-/* *** Dependency *** */
 #include "bitstream.h"
=20
-
 /* *****************************************
 *  Static allocation
 *******************************************/
@@ -317,16 +253,6 @@ If there is an error, the function will return an erro=
r code, which can be teste
 unsigned FSE_optimalTableLog_internal(unsigned maxTableLog, size_t srcSize=
, unsigned maxSymbolValue, unsigned minus);
 /*< same as FSE_optimalTableLog(), which used `minus=3D=3D2` */
=20
-/* FSE_compress_wksp() :
- * Same as FSE_compress2(), but using an externally allocated scratch buff=
er (`workSpace`).
- * FSE_COMPRESS_WKSP_SIZE_U32() provides the minimum size required for `wo=
rkSpace` as a table of FSE_CTable.
- */
-#define FSE_COMPRESS_WKSP_SIZE_U32(maxTableLog, maxSymbolValue)   ( FSE_CT=
ABLE_SIZE_U32(maxTableLog, maxSymbolValue) + ((maxTableLog > 12) ? (1 << (m=
axTableLog - 2)) : 1024) )
-size_t FSE_compress_wksp (void* dst, size_t dstSize, const void* src, size=
_t srcSize, unsigned maxSymbolValue, unsigned tableLog, void* workSpace, si=
ze_t wkspSize);
-
-size_t FSE_buildCTable_raw (FSE_CTable* ct, unsigned nbBits);
-/*< build a fake FSE_CTable, designed for a flat distribution, where each =
symbol uses nbBits */
-
 size_t FSE_buildCTable_rle (FSE_CTable* ct, unsigned char symbolValue);
 /*< build a fake FSE_CTable, designed to compress always the same symbolVa=
lue */
=20
@@ -344,19 +270,11 @@ size_t FSE_buildCTable_wksp(FSE_CTable* ct, const sho=
rt* normalizedCounter, unsi
 FSE_PUBLIC_API size_t FSE_buildDTable_wksp(FSE_DTable* dt, const short* no=
rmalizedCounter, unsigned maxSymbolValue, unsigned tableLog, void* workSpac=
e, size_t wkspSize);
 /*< Same as FSE_buildDTable(), using an externally allocated `workspace` p=
roduced with `FSE_BUILD_DTABLE_WKSP_SIZE_U32(maxSymbolValue)` */
=20
-size_t FSE_buildDTable_raw (FSE_DTable* dt, unsigned nbBits);
-/*< build a fake FSE_DTable, designed to read a flat distribution where ea=
ch symbol uses nbBits */
-
-size_t FSE_buildDTable_rle (FSE_DTable* dt, unsigned char symbolValue);
-/*< build a fake FSE_DTable, designed to always generate the same symbolVa=
lue */
-
-#define FSE_DECOMPRESS_WKSP_SIZE_U32(maxTableLog, maxSymbolValue) (FSE_DTA=
BLE_SIZE_U32(maxTableLog) + FSE_BUILD_DTABLE_WKSP_SIZE_U32(maxTableLog, max=
SymbolValue) + (FSE_MAX_SYMBOL_VALUE + 1) / 2 + 1)
+#define FSE_DECOMPRESS_WKSP_SIZE_U32(maxTableLog, maxSymbolValue) (FSE_DTA=
BLE_SIZE_U32(maxTableLog) + 1 + FSE_BUILD_DTABLE_WKSP_SIZE_U32(maxTableLog,=
 maxSymbolValue) + (FSE_MAX_SYMBOL_VALUE + 1) / 2 + 1)
 #define FSE_DECOMPRESS_WKSP_SIZE(maxTableLog, maxSymbolValue) (FSE_DECOMPR=
ESS_WKSP_SIZE_U32(maxTableLog, maxSymbolValue) * sizeof(unsigned))
-size_t FSE_decompress_wksp(void* dst, size_t dstCapacity, const void* cSrc=
, size_t cSrcSize, unsigned maxLog, void* workSpace, size_t wkspSize);
-/*< same as FSE_decompress(), using an externally allocated `workSpace` pr=
oduced with `FSE_DECOMPRESS_WKSP_SIZE_U32(maxLog, maxSymbolValue)` */
-
 size_t FSE_decompress_wksp_bmi2(void* dst, size_t dstCapacity, const void*=
 cSrc, size_t cSrcSize, unsigned maxLog, void* workSpace, size_t wkspSize, =
int bmi2);
-/*< Same as FSE_decompress_wksp() but with dynamic BMI2 support. Pass 1 if=
 your CPU supports BMI2 or 0 if it doesn't. */
+/*< same as FSE_decompress(), using an externally allocated `workSpace` pr=
oduced with `FSE_DECOMPRESS_WKSP_SIZE_U32(maxLog, maxSymbolValue)`.
+ * Set bmi2 to 1 if your CPU supports BMI2 or 0 if it doesn't */
=20
 typedef enum {
    FSE_repeat_none,  /*< Cannot use the previous table */
@@ -539,20 +457,20 @@ MEM_STATIC void FSE_encodeSymbol(BIT_CStream_t* bitC,=
 FSE_CState_t* statePtr, un
     FSE_symbolCompressionTransform const symbolTT =3D ((const FSE_symbolCo=
mpressionTransform*)(statePtr->symbolTT))[symbol];
     const U16* const stateTable =3D (const U16*)(statePtr->stateTable);
     U32 const nbBitsOut  =3D (U32)((statePtr->value + symbolTT.deltaNbBits=
) >> 16);
-    BIT_addBits(bitC, statePtr->value, nbBitsOut);
+    BIT_addBits(bitC, (BitContainerType)statePtr->value, nbBitsOut);
     statePtr->value =3D stateTable[ (statePtr->value >> nbBitsOut) + symbo=
lTT.deltaFindState];
 }
=20
 MEM_STATIC void FSE_flushCState(BIT_CStream_t* bitC, const FSE_CState_t* s=
tatePtr)
 {
-    BIT_addBits(bitC, statePtr->value, statePtr->stateLog);
+    BIT_addBits(bitC, (BitContainerType)statePtr->value, statePtr->stateLo=
g);
     BIT_flushBits(bitC);
 }
=20
=20
 /* FSE_getMaxNbBits() :
  * Approximate maximum cost of a symbol, in bits.
- * Fractional get rounded up (i.e : a symbol with a normalized frequency o=
f 3 gives the same result as a frequency of 2)
+ * Fractional get rounded up (i.e. a symbol with a normalized frequency of=
 3 gives the same result as a frequency of 2)
  * note 1 : assume symbolValue is valid (<=3D maxSymbolValue)
  * note 2 : if freq[symbolValue]=3D=3D0, @return a fake cost of tableLog+1=
 bits */
 MEM_STATIC U32 FSE_getMaxNbBits(const void* symbolTTPtr, U32 symbolValue)
@@ -705,7 +623,4 @@ MEM_STATIC unsigned FSE_endOfDState(const FSE_DState_t*=
 DStatePtr)
=20
 #define FSE_TABLESTEP(tableSize) (((tableSize)>>1) + ((tableSize)>>3) + 3)
=20
-
 #endif /* FSE_STATIC_LINKING_ONLY */
-
-
diff --git a/lib/zstd/common/fse_decompress.c b/lib/zstd/common/fse_decompr=
ess.c
index 8dcb8ca39767..15081d8dc607 100644
--- a/lib/zstd/common/fse_decompress.c
+++ b/lib/zstd/common/fse_decompress.c
@@ -1,6 +1,7 @@
+// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
 /* ******************************************************************
  * FSE : Finite State Entropy decoder
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  *
  *  You can contact the author at :
  *  - FSE source repository : https://github.com/Cyan4973/FiniteStateEntro=
py
@@ -22,8 +23,8 @@
 #define FSE_STATIC_LINKING_ONLY
 #include "fse.h"
 #include "error_private.h"
-#define ZSTD_DEPS_NEED_MALLOC
-#include "zstd_deps.h"
+#include "zstd_deps.h"  /* ZSTD_memcpy */
+#include "bits.h"       /* ZSTD_highbit32 */
=20
=20
 /* **************************************************************
@@ -55,19 +56,6 @@
 #define FSE_FUNCTION_NAME(X,Y) FSE_CAT(X,Y)
 #define FSE_TYPE_NAME(X,Y) FSE_CAT(X,Y)
=20
-
-/* Function templates */
-FSE_DTable* FSE_createDTable (unsigned tableLog)
-{
-    if (tableLog > FSE_TABLELOG_ABSOLUTE_MAX) tableLog =3D FSE_TABLELOG_AB=
SOLUTE_MAX;
-    return (FSE_DTable*)ZSTD_malloc( FSE_DTABLE_SIZE_U32(tableLog) * sizeo=
f (U32) );
-}
-
-void FSE_freeDTable (FSE_DTable* dt)
-{
-    ZSTD_free(dt);
-}
-
 static size_t FSE_buildDTable_internal(FSE_DTable* dt, const short* normal=
izedCounter, unsigned maxSymbolValue, unsigned tableLog, void* workSpace, s=
ize_t wkspSize)
 {
     void* const tdPtr =3D dt+1;   /* because *dt is unsigned, 32-bits alig=
ned on 32-bits */
@@ -96,7 +84,7 @@ static size_t FSE_buildDTable_internal(FSE_DTable* dt, co=
nst short* normalizedCo
                     symbolNext[s] =3D 1;
                 } else {
                     if (normalizedCounter[s] >=3D largeLimit) DTableH.fast=
Mode=3D0;
-                    symbolNext[s] =3D normalizedCounter[s];
+                    symbolNext[s] =3D (U16)normalizedCounter[s];
         }   }   }
         ZSTD_memcpy(dt, &DTableH, sizeof(DTableH));
     }
@@ -111,8 +99,7 @@ static size_t FSE_buildDTable_internal(FSE_DTable* dt, c=
onst short* normalizedCo
          * all symbols have counts <=3D 8. We ensure we have 8 bytes at th=
e end of
          * our buffer to handle the over-write.
          */
-        {
-            U64 const add =3D 0x0101010101010101ull;
+        {   U64 const add =3D 0x0101010101010101ull;
             size_t pos =3D 0;
             U64 sv =3D 0;
             U32 s;
@@ -123,14 +110,13 @@ static size_t FSE_buildDTable_internal(FSE_DTable* dt=
, const short* normalizedCo
                 for (i =3D 8; i < n; i +=3D 8) {
                     MEM_write64(spread + pos + i, sv);
                 }
-                pos +=3D n;
-            }
-        }
+                pos +=3D (size_t)n;
+        }   }
         /* Now we spread those positions across the table.
-         * The benefit of doing it in two stages is that we avoid the the
+         * The benefit of doing it in two stages is that we avoid the
          * variable size inner loop, which caused lots of branch misses.
          * Now we can run through all the positions without any branch mis=
ses.
-         * We unroll the loop twice, since that is what emperically worked=
 best.
+         * We unroll the loop twice, since that is what empirically worked=
 best.
          */
         {
             size_t position =3D 0;
@@ -166,7 +152,7 @@ static size_t FSE_buildDTable_internal(FSE_DTable* dt, =
const short* normalizedCo
         for (u=3D0; u<tableSize; u++) {
             FSE_FUNCTION_TYPE const symbol =3D (FSE_FUNCTION_TYPE)(tableDe=
code[u].symbol);
             U32 const nextState =3D symbolNext[symbol]++;
-            tableDecode[u].nbBits =3D (BYTE) (tableLog - BIT_highbit32(nex=
tState) );
+            tableDecode[u].nbBits =3D (BYTE) (tableLog - ZSTD_highbit32(ne=
xtState) );
             tableDecode[u].newState =3D (U16) ( (nextState << tableDecode[=
u].nbBits) - tableSize);
     }   }
=20
@@ -184,49 +170,6 @@ size_t FSE_buildDTable_wksp(FSE_DTable* dt, const shor=
t* normalizedCounter, unsi
 /*-*******************************************************
 *  Decompression (Byte symbols)
 *********************************************************/
-size_t FSE_buildDTable_rle (FSE_DTable* dt, BYTE symbolValue)
-{
-    void* ptr =3D dt;
-    FSE_DTableHeader* const DTableH =3D (FSE_DTableHeader*)ptr;
-    void* dPtr =3D dt + 1;
-    FSE_decode_t* const cell =3D (FSE_decode_t*)dPtr;
-
-    DTableH->tableLog =3D 0;
-    DTableH->fastMode =3D 0;
-
-    cell->newState =3D 0;
-    cell->symbol =3D symbolValue;
-    cell->nbBits =3D 0;
-
-    return 0;
-}
-
-
-size_t FSE_buildDTable_raw (FSE_DTable* dt, unsigned nbBits)
-{
-    void* ptr =3D dt;
-    FSE_DTableHeader* const DTableH =3D (FSE_DTableHeader*)ptr;
-    void* dPtr =3D dt + 1;
-    FSE_decode_t* const dinfo =3D (FSE_decode_t*)dPtr;
-    const unsigned tableSize =3D 1 << nbBits;
-    const unsigned tableMask =3D tableSize - 1;
-    const unsigned maxSV1 =3D tableMask+1;
-    unsigned s;
-
-    /* Sanity checks */
-    if (nbBits < 1) return ERROR(GENERIC);         /* min size */
-
-    /* Build Decoding Table */
-    DTableH->tableLog =3D (U16)nbBits;
-    DTableH->fastMode =3D 1;
-    for (s=3D0; s<maxSV1; s++) {
-        dinfo[s].newState =3D 0;
-        dinfo[s].symbol =3D (BYTE)s;
-        dinfo[s].nbBits =3D (BYTE)nbBits;
-    }
-
-    return 0;
-}
=20
 FORCE_INLINE_TEMPLATE size_t FSE_decompress_usingDTable_generic(
           void* dst, size_t maxDstSize,
@@ -248,6 +191,8 @@ FORCE_INLINE_TEMPLATE size_t FSE_decompress_usingDTable=
_generic(
     FSE_initDState(&state1, &bitD, dt);
     FSE_initDState(&state2, &bitD, dt);
=20
+    RETURN_ERROR_IF(BIT_reloadDStream(&bitD)=3D=3DBIT_DStream_overflow, co=
rruption_detected, "");
+
 #define FSE_GETSYMBOL(statePtr) fast ? FSE_decodeSymbolFast(statePtr, &bit=
D) : FSE_decodeSymbol(statePtr, &bitD)
=20
     /* 4 symbols per loop */
@@ -287,32 +232,12 @@ FORCE_INLINE_TEMPLATE size_t FSE_decompress_usingDTab=
le_generic(
             break;
     }   }
=20
-    return op-ostart;
-}
-
-
-size_t FSE_decompress_usingDTable(void* dst, size_t originalSize,
-                            const void* cSrc, size_t cSrcSize,
-                            const FSE_DTable* dt)
-{
-    const void* ptr =3D dt;
-    const FSE_DTableHeader* DTableH =3D (const FSE_DTableHeader*)ptr;
-    const U32 fastMode =3D DTableH->fastMode;
-
-    /* select fast mode (static) */
-    if (fastMode) return FSE_decompress_usingDTable_generic(dst, originalS=
ize, cSrc, cSrcSize, dt, 1);
-    return FSE_decompress_usingDTable_generic(dst, originalSize, cSrc, cSr=
cSize, dt, 0);
-}
-
-
-size_t FSE_decompress_wksp(void* dst, size_t dstCapacity, const void* cSrc=
, size_t cSrcSize, unsigned maxLog, void* workSpace, size_t wkspSize)
-{
-    return FSE_decompress_wksp_bmi2(dst, dstCapacity, cSrc, cSrcSize, maxL=
og, workSpace, wkspSize, /* bmi2 */ 0);
+    assert(op >=3D ostart);
+    return (size_t)(op-ostart);
 }
=20
 typedef struct {
     short ncount[FSE_MAX_SYMBOL_VALUE + 1];
-    FSE_DTable dtable[]; /* Dynamically sized */
 } FSE_DecompressWksp;
=20
=20
@@ -327,13 +252,18 @@ FORCE_INLINE_TEMPLATE size_t FSE_decompress_wksp_body(
     unsigned tableLog;
     unsigned maxSymbolValue =3D FSE_MAX_SYMBOL_VALUE;
     FSE_DecompressWksp* const wksp =3D (FSE_DecompressWksp*)workSpace;
+    size_t const dtablePos =3D sizeof(FSE_DecompressWksp) / sizeof(FSE_DTa=
ble);
+    FSE_DTable* const dtable =3D (FSE_DTable*)workSpace + dtablePos;
=20
-    DEBUG_STATIC_ASSERT((FSE_MAX_SYMBOL_VALUE + 1) % 2 =3D=3D 0);
+    FSE_STATIC_ASSERT((FSE_MAX_SYMBOL_VALUE + 1) % 2 =3D=3D 0);
     if (wkspSize < sizeof(*wksp)) return ERROR(GENERIC);
=20
+    /* correct offset to dtable depends on this property */
+    FSE_STATIC_ASSERT(sizeof(FSE_DecompressWksp) % sizeof(FSE_DTable) =3D=
=3D 0);
+
     /* normal FSE decoding mode */
-    {
-        size_t const NCountLength =3D FSE_readNCount_bmi2(wksp->ncount, &m=
axSymbolValue, &tableLog, istart, cSrcSize, bmi2);
+    {   size_t const NCountLength =3D
+            FSE_readNCount_bmi2(wksp->ncount, &maxSymbolValue, &tableLog, =
istart, cSrcSize, bmi2);
         if (FSE_isError(NCountLength)) return NCountLength;
         if (tableLog > maxLog) return ERROR(tableLog_tooLarge);
         assert(NCountLength <=3D cSrcSize);
@@ -342,19 +272,20 @@ FORCE_INLINE_TEMPLATE size_t FSE_decompress_wksp_body(
     }
=20
     if (FSE_DECOMPRESS_WKSP_SIZE(tableLog, maxSymbolValue) > wkspSize) ret=
urn ERROR(tableLog_tooLarge);
-    workSpace =3D wksp->dtable + FSE_DTABLE_SIZE_U32(tableLog);
+    assert(sizeof(*wksp) + FSE_DTABLE_SIZE(tableLog) <=3D wkspSize);
+    workSpace =3D (BYTE*)workSpace + sizeof(*wksp) + FSE_DTABLE_SIZE(table=
Log);
     wkspSize -=3D sizeof(*wksp) + FSE_DTABLE_SIZE(tableLog);
=20
-    CHECK_F( FSE_buildDTable_internal(wksp->dtable, wksp->ncount, maxSymbo=
lValue, tableLog, workSpace, wkspSize) );
+    CHECK_F( FSE_buildDTable_internal(dtable, wksp->ncount, maxSymbolValue=
, tableLog, workSpace, wkspSize) );
=20
     {
-        const void* ptr =3D wksp->dtable;
+        const void* ptr =3D dtable;
         const FSE_DTableHeader* DTableH =3D (const FSE_DTableHeader*)ptr;
         const U32 fastMode =3D DTableH->fastMode;
=20
         /* select fast mode (static) */
-        if (fastMode) return FSE_decompress_usingDTable_generic(dst, dstCa=
pacity, ip, cSrcSize, wksp->dtable, 1);
-        return FSE_decompress_usingDTable_generic(dst, dstCapacity, ip, cS=
rcSize, wksp->dtable, 0);
+        if (fastMode) return FSE_decompress_usingDTable_generic(dst, dstCa=
pacity, ip, cSrcSize, dtable, 1);
+        return FSE_decompress_usingDTable_generic(dst, dstCapacity, ip, cS=
rcSize, dtable, 0);
     }
 }
=20
@@ -382,9 +313,4 @@ size_t FSE_decompress_wksp_bmi2(void* dst, size_t dstCa=
pacity, const void* cSrc,
     return FSE_decompress_wksp_body_default(dst, dstCapacity, cSrc, cSrcSi=
ze, maxLog, workSpace, wkspSize);
 }
=20
-
-typedef FSE_DTable DTable_max_t[FSE_DTABLE_SIZE_U32(FSE_MAX_TABLELOG)];
-
-
-
 #endif   /* FSE_COMMONDEFS_ONLY */
diff --git a/lib/zstd/common/huf.h b/lib/zstd/common/huf.h
index 5042ff870308..49736dcd8f49 100644
--- a/lib/zstd/common/huf.h
+++ b/lib/zstd/common/huf.h
@@ -1,7 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
 /* ******************************************************************
  * huff0 huffman codec,
  * part of Finite State Entropy library
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  *
  * You can contact the author at :
  * - Source repository : https://github.com/Cyan4973/FiniteStateEntropy
@@ -12,105 +13,26 @@
  * You may select, at your option, one of the above-listed licenses.
 ****************************************************************** */
=20
-
 #ifndef HUF_H_298734234
 #define HUF_H_298734234
=20
 /* *** Dependencies *** */
 #include "zstd_deps.h"    /* size_t */
-
-
-/* *** library symbols visibility *** */
-/* Note : when linking with -fvisibility=3Dhidden on gcc, or by default on=
 Visual,
- *        HUF symbols remain "private" (internal symbols for library only).
- *        Set macro FSE_DLL_EXPORT to 1 if you want HUF symbols visible on=
 DLL interface */
-#if defined(FSE_DLL_EXPORT) && (FSE_DLL_EXPORT=3D=3D1) && defined(__GNUC__=
) && (__GNUC__ >=3D 4)
-#  define HUF_PUBLIC_API __attribute__ ((visibility ("default")))
-#elif defined(FSE_DLL_EXPORT) && (FSE_DLL_EXPORT=3D=3D1)   /* Visual expec=
ted */
-#  define HUF_PUBLIC_API __declspec(dllexport)
-#elif defined(FSE_DLL_IMPORT) && (FSE_DLL_IMPORT=3D=3D1)
-#  define HUF_PUBLIC_API __declspec(dllimport)  /* not required, just to g=
enerate faster code (saves a function pointer load from IAT and an indirect=
 jump) */
-#else
-#  define HUF_PUBLIC_API
-#endif
-
-
-/* =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D */
-/* ***  simple functions  *** */
-/* =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D */
-
-/* HUF_compress() :
- *  Compress content from buffer 'src', of size 'srcSize', into buffer 'ds=
t'.
- * 'dst' buffer must be already allocated.
- *  Compression runs faster if `dstCapacity` >=3D HUF_compressBound(srcSiz=
e).
- * `srcSize` must be <=3D `HUF_BLOCKSIZE_MAX` =3D=3D 128 KB.
- * @return : size of compressed data (<=3D `dstCapacity`).
- *  Special values : if return =3D=3D 0, srcData is not compressible =3D> =
Nothing is stored within dst !!!
- *                   if HUF_isError(return), compression failed (more deta=
ils using HUF_getErrorName())
- */
-HUF_PUBLIC_API size_t HUF_compress(void* dst, size_t dstCapacity,
-                             const void* src, size_t srcSize);
-
-/* HUF_decompress() :
- *  Decompress HUF data from buffer 'cSrc', of size 'cSrcSize',
- *  into already allocated buffer 'dst', of minimum size 'dstSize'.
- * `originalSize` : **must** be the ***exact*** size of original (uncompre=
ssed) data.
- *  Note : in contrast with FSE, HUF_decompress can regenerate
- *         RLE (cSrcSize=3D=3D1) and uncompressed (cSrcSize=3D=3DdstSize) =
data,
- *         because it knows size to regenerate (originalSize).
- * @return : size of regenerated data (=3D=3D originalSize),
- *           or an error code, which can be tested using HUF_isError()
- */
-HUF_PUBLIC_API size_t HUF_decompress(void* dst,  size_t originalSize,
-                               const void* cSrc, size_t cSrcSize);
-
+#include "mem.h"          /* U32 */
+#define FSE_STATIC_LINKING_ONLY
+#include "fse.h"
=20
 /* ***   Tool functions *** */
-#define HUF_BLOCKSIZE_MAX (128 * 1024)                  /*< maximum input =
size for a single block compressed with HUF_compress */
-HUF_PUBLIC_API size_t HUF_compressBound(size_t size);   /*< maximum compre=
ssed size (worst case) */
+#define HUF_BLOCKSIZE_MAX (128 * 1024)   /*< maximum input size for a sing=
le block compressed with HUF_compress */
+size_t HUF_compressBound(size_t size);   /*< maximum compressed size (wors=
t case) */
=20
 /* Error Management */
-HUF_PUBLIC_API unsigned    HUF_isError(size_t code);       /*< tells if a =
return value is an error code */
-HUF_PUBLIC_API const char* HUF_getErrorName(size_t code);  /*< provides er=
ror code string (useful for debugging) */
+unsigned    HUF_isError(size_t code);       /*< tells if a return value is=
 an error code */
+const char* HUF_getErrorName(size_t code);  /*< provides error code string=
 (useful for debugging) */
=20
=20
-/* ***   Advanced function   *** */
-
-/* HUF_compress2() :
- *  Same as HUF_compress(), but offers control over `maxSymbolValue` and `=
tableLog`.
- * `maxSymbolValue` must be <=3D HUF_SYMBOLVALUE_MAX .
- * `tableLog` must be `<=3D HUF_TABLELOG_MAX` . */
-HUF_PUBLIC_API size_t HUF_compress2 (void* dst, size_t dstCapacity,
-                               const void* src, size_t srcSize,
-                               unsigned maxSymbolValue, unsigned tableLog);
-
-/* HUF_compress4X_wksp() :
- *  Same as HUF_compress2(), but uses externally allocated `workSpace`.
- * `workspace` must be at least as large as HUF_WORKSPACE_SIZE */
 #define HUF_WORKSPACE_SIZE ((8 << 10) + 512 /* sorting scratch space */)
 #define HUF_WORKSPACE_SIZE_U64 (HUF_WORKSPACE_SIZE / sizeof(U64))
-HUF_PUBLIC_API size_t HUF_compress4X_wksp (void* dst, size_t dstCapacity,
-                                     const void* src, size_t srcSize,
-                                     unsigned maxSymbolValue, unsigned tab=
leLog,
-                                     void* workSpace, size_t wkspSize);
-
-#endif   /* HUF_H_298734234 */
-
-/* ******************************************************************
- *  WARNING !!
- *  The following section contains advanced and experimental definitions
- *  which shall never be used in the context of a dynamic library,
- *  because they are not guaranteed to remain stable in the future.
- *  Only consider them in association with static linking.
- * *****************************************************************/
-#if !defined(HUF_H_HUF_STATIC_LINKING_ONLY)
-#define HUF_H_HUF_STATIC_LINKING_ONLY
-
-/* *** Dependencies *** */
-#include "mem.h"   /* U32 */
-#define FSE_STATIC_LINKING_ONLY
-#include "fse.h"
-
=20
 /* *** Constants *** */
 #define HUF_TABLELOG_MAX      12      /* max runtime value of tableLog (du=
e to static allocation); can be modified up to HUF_TABLELOG_ABSOLUTEMAX */
@@ -151,25 +73,49 @@ typedef U32 HUF_DTable;
 /* ****************************************
 *  Advanced decompression functions
 ******************************************/
-size_t HUF_decompress4X1 (void* dst, size_t dstSize, const void* cSrc, siz=
e_t cSrcSize);   /*< single-symbol decoder */
-#ifndef HUF_FORCE_DECOMPRESS_X1
-size_t HUF_decompress4X2 (void* dst, size_t dstSize, const void* cSrc, siz=
e_t cSrcSize);   /*< double-symbols decoder */
-#endif
=20
-size_t HUF_decompress4X_DCtx (HUF_DTable* dctx, void* dst, size_t dstSize,=
 const void* cSrc, size_t cSrcSize);   /*< decodes RLE and uncompressed */
-size_t HUF_decompress4X_hufOnly(HUF_DTable* dctx, void* dst, size_t dstSiz=
e, const void* cSrc, size_t cSrcSize); /*< considers RLE and uncompressed a=
s errors */
-size_t HUF_decompress4X_hufOnly_wksp(HUF_DTable* dctx, void* dst, size_t d=
stSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize=
); /*< considers RLE and uncompressed as errors */
-size_t HUF_decompress4X1_DCtx(HUF_DTable* dctx, void* dst, size_t dstSize,=
 const void* cSrc, size_t cSrcSize);   /*< single-symbol decoder */
-size_t HUF_decompress4X1_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dst=
Size, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize);=
   /*< single-symbol decoder */
-#ifndef HUF_FORCE_DECOMPRESS_X1
-size_t HUF_decompress4X2_DCtx(HUF_DTable* dctx, void* dst, size_t dstSize,=
 const void* cSrc, size_t cSrcSize);   /*< double-symbols decoder */
-size_t HUF_decompress4X2_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dst=
Size, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize);=
   /*< double-symbols decoder */
-#endif
+/*
+ * Huffman flags bitset.
+ * For all flags, 0 is the default value.
+ */
+typedef enum {
+    /*
+     * If compiled with DYNAMIC_BMI2: Set flag only if the CPU supports BM=
I2 at runtime.
+     * Otherwise: Ignored.
+     */
+    HUF_flags_bmi2 =3D (1 << 0),
+    /*
+     * If set: Test possible table depths to find the one that produces th=
e smallest header + encoded size.
+     * If unset: Use heuristic to find the table depth.
+     */
+    HUF_flags_optimalDepth =3D (1 << 1),
+    /*
+     * If set: If the previous table can encode the input, always reuse th=
e previous table.
+     * If unset: If the previous table can encode the input, reuse the pre=
vious table if it results in a smaller output.
+     */
+    HUF_flags_preferRepeat =3D (1 << 2),
+    /*
+     * If set: Sample the input and check if the sample is uncompressible,=
 if it is then don't attempt to compress.
+     * If unset: Always histogram the entire input.
+     */
+    HUF_flags_suspectUncompressible =3D (1 << 3),
+    /*
+     * If set: Don't use assembly implementations
+     * If unset: Allow using assembly implementations
+     */
+    HUF_flags_disableAsm =3D (1 << 4),
+    /*
+     * If set: Don't use the fast decoding loop, always use the fallback d=
ecoding loop.
+     * If unset: Use the fast decoding loop when possible.
+     */
+    HUF_flags_disableFast =3D (1 << 5)
+} HUF_flags_e;
=20
=20
 /* ****************************************
  *  HUF detailed API
  * ****************************************/
+#define HUF_OPTIMAL_DEPTH_THRESHOLD ZSTD_btultra
=20
 /*! HUF_compress() does the following:
  *  1. count symbol occurrence from source[] into table count[] using FSE_=
count() (exposed within "fse.h")
@@ -182,12 +128,12 @@ size_t HUF_decompress4X2_DCtx_wksp(HUF_DTable* dctx, =
void* dst, size_t dstSize,
  *  For example, it's possible to compress several blocks using the same '=
CTable',
  *  or to save and regenerate 'CTable' using external methods.
  */
-unsigned HUF_optimalTableLog(unsigned maxTableLog, size_t srcSize, unsigne=
d maxSymbolValue);
-size_t HUF_buildCTable (HUF_CElt* CTable, const unsigned* count, unsigned =
maxSymbolValue, unsigned maxNbBits);   /* @return : maxNbBits; CTable and c=
ount can overlap. In which case, CTable will overwrite count content */
-size_t HUF_writeCTable (void* dst, size_t maxDstSize, const HUF_CElt* CTab=
le, unsigned maxSymbolValue, unsigned huffLog);
+unsigned HUF_minTableLog(unsigned symbolCardinality);
+unsigned HUF_cardinality(const unsigned* count, unsigned maxSymbolValue);
+unsigned HUF_optimalTableLog(unsigned maxTableLog, size_t srcSize, unsigne=
d maxSymbolValue, void* workSpace,
+ size_t wkspSize, HUF_CElt* table, const unsigned* count, int flags); /* t=
able is used as scratch space for building and testing tables, not a return=
 value */
 size_t HUF_writeCTable_wksp(void* dst, size_t maxDstSize, const HUF_CElt* =
CTable, unsigned maxSymbolValue, unsigned huffLog, void* workspace, size_t =
workspaceSize);
-size_t HUF_compress4X_usingCTable(void* dst, size_t dstSize, const void* s=
rc, size_t srcSize, const HUF_CElt* CTable);
-size_t HUF_compress4X_usingCTable_bmi2(void* dst, size_t dstSize, const vo=
id* src, size_t srcSize, const HUF_CElt* CTable, int bmi2);
+size_t HUF_compress4X_usingCTable(void* dst, size_t dstSize, const void* s=
rc, size_t srcSize, const HUF_CElt* CTable, int flags);
 size_t HUF_estimateCompressedSize(const HUF_CElt* CTable, const unsigned* =
count, unsigned maxSymbolValue);
 int HUF_validateCTable(const HUF_CElt* CTable, const unsigned* count, unsi=
gned maxSymbolValue);
=20
@@ -196,6 +142,7 @@ typedef enum {
    HUF_repeat_check, /*< Can use the previous table but it must be checked=
. Note : The previous table must have been constructed by HUF_compress{1, 4=
}X_repeat */
    HUF_repeat_valid  /*< Can use the previous table and it is assumed to b=
e valid */
  } HUF_repeat;
+
 /* HUF_compress4X_repeat() :
  *  Same as HUF_compress4X_wksp(), but considers using hufTable if *repeat=
 !=3D HUF_repeat_none.
  *  If it uses hufTable it does not modify hufTable or repeat.
@@ -206,13 +153,13 @@ size_t HUF_compress4X_repeat(void* dst, size_t dstSiz=
e,
                        const void* src, size_t srcSize,
                        unsigned maxSymbolValue, unsigned tableLog,
                        void* workSpace, size_t wkspSize,    /*< `workSpace=
` must be aligned on 4-bytes boundaries, `wkspSize` must be >=3D HUF_WORKSP=
ACE_SIZE */
-                       HUF_CElt* hufTable, HUF_repeat* repeat, int preferR=
epeat, int bmi2, unsigned suspectUncompressible);
+                       HUF_CElt* hufTable, HUF_repeat* repeat, int flags);
=20
 /* HUF_buildCTable_wksp() :
  *  Same as HUF_buildCTable(), but using externally allocated scratch buff=
er.
  * `workSpace` must be aligned on 4-bytes boundaries, and its size must be=
 >=3D HUF_CTABLE_WORKSPACE_SIZE.
  */
-#define HUF_CTABLE_WORKSPACE_SIZE_U32 (2*HUF_SYMBOLVALUE_MAX +1 +1)
+#define HUF_CTABLE_WORKSPACE_SIZE_U32 ((4 * (HUF_SYMBOLVALUE_MAX + 1)) + 1=
92)
 #define HUF_CTABLE_WORKSPACE_SIZE (HUF_CTABLE_WORKSPACE_SIZE_U32 * sizeof(=
unsigned))
 size_t HUF_buildCTable_wksp (HUF_CElt* tree,
                        const unsigned* count, U32 maxSymbolValue, U32 maxN=
bBits,
@@ -238,7 +185,7 @@ size_t HUF_readStats_wksp(BYTE* huffWeight, size_t hwSi=
ze,
                           U32* rankStats, U32* nbSymbolsPtr, U32* tableLog=
Ptr,
                           const void* src, size_t srcSize,
                           void* workspace, size_t wkspSize,
-                          int bmi2);
+                          int flags);
=20
 /* HUF_readCTable() :
  *  Loading a CTable saved with HUF_writeCTable() */
@@ -246,9 +193,22 @@ size_t HUF_readCTable (HUF_CElt* CTable, unsigned* max=
SymbolValuePtr, const void
=20
 /* HUF_getNbBitsFromCTable() :
  *  Read nbBits from CTable symbolTable, for symbol `symbolValue` presumed=
 <=3D HUF_SYMBOLVALUE_MAX
- *  Note 1 : is not inlined, as HUF_CElt definition is private */
+ *  Note 1 : If symbolValue > HUF_readCTableHeader(symbolTable).maxSymbolV=
alue, returns 0
+ *  Note 2 : is not inlined, as HUF_CElt definition is private
+ */
 U32 HUF_getNbBitsFromCTable(const HUF_CElt* symbolTable, U32 symbolValue);
=20
+typedef struct {
+    BYTE tableLog;
+    BYTE maxSymbolValue;
+    BYTE unused[sizeof(size_t) - 2];
+} HUF_CTableHeader;
+
+/* HUF_readCTableHeader() :
+ *  @returns The header from the CTable specifying the tableLog and the ma=
xSymbolValue.
+ */
+HUF_CTableHeader HUF_readCTableHeader(HUF_CElt const* ctable);
+
 /*
  * HUF_decompress() does the following:
  * 1. select the decompression algorithm (X1, X2) based on pre-computed he=
uristics
@@ -276,32 +236,12 @@ U32 HUF_selectDecoder (size_t dstSize, size_t cSrcSiz=
e);
 #define HUF_DECOMPRESS_WORKSPACE_SIZE ((2 << 10) + (1 << 9))
 #define HUF_DECOMPRESS_WORKSPACE_SIZE_U32 (HUF_DECOMPRESS_WORKSPACE_SIZE /=
 sizeof(U32))
=20
-#ifndef HUF_FORCE_DECOMPRESS_X2
-size_t HUF_readDTableX1 (HUF_DTable* DTable, const void* src, size_t srcSi=
ze);
-size_t HUF_readDTableX1_wksp (HUF_DTable* DTable, const void* src, size_t =
srcSize, void* workSpace, size_t wkspSize);
-#endif
-#ifndef HUF_FORCE_DECOMPRESS_X1
-size_t HUF_readDTableX2 (HUF_DTable* DTable, const void* src, size_t srcSi=
ze);
-size_t HUF_readDTableX2_wksp (HUF_DTable* DTable, const void* src, size_t =
srcSize, void* workSpace, size_t wkspSize);
-#endif
-
-size_t HUF_decompress4X_usingDTable(void* dst, size_t maxDstSize, const vo=
id* cSrc, size_t cSrcSize, const HUF_DTable* DTable);
-#ifndef HUF_FORCE_DECOMPRESS_X2
-size_t HUF_decompress4X1_usingDTable(void* dst, size_t maxDstSize, const v=
oid* cSrc, size_t cSrcSize, const HUF_DTable* DTable);
-#endif
-#ifndef HUF_FORCE_DECOMPRESS_X1
-size_t HUF_decompress4X2_usingDTable(void* dst, size_t maxDstSize, const v=
oid* cSrc, size_t cSrcSize, const HUF_DTable* DTable);
-#endif
-
=20
 /* =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D */
 /* single stream variants */
 /* =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D */
=20
-size_t HUF_compress1X (void* dst, size_t dstSize, const void* src, size_t =
srcSize, unsigned maxSymbolValue, unsigned tableLog);
-size_t HUF_compress1X_wksp (void* dst, size_t dstSize, const void* src, si=
ze_t srcSize, unsigned maxSymbolValue, unsigned tableLog, void* workSpace, =
size_t wkspSize);  /*< `workSpace` must be a table of at least HUF_WORKSPAC=
E_SIZE_U64 U64 */
-size_t HUF_compress1X_usingCTable(void* dst, size_t dstSize, const void* s=
rc, size_t srcSize, const HUF_CElt* CTable);
-size_t HUF_compress1X_usingCTable_bmi2(void* dst, size_t dstSize, const vo=
id* src, size_t srcSize, const HUF_CElt* CTable, int bmi2);
+size_t HUF_compress1X_usingCTable(void* dst, size_t dstSize, const void* s=
rc, size_t srcSize, const HUF_CElt* CTable, int flags);
 /* HUF_compress1X_repeat() :
  *  Same as HUF_compress1X_wksp(), but considers using hufTable if *repeat=
 !=3D HUF_repeat_none.
  *  If it uses hufTable it does not modify hufTable or repeat.
@@ -312,47 +252,27 @@ size_t HUF_compress1X_repeat(void* dst, size_t dstSiz=
e,
                        const void* src, size_t srcSize,
                        unsigned maxSymbolValue, unsigned tableLog,
                        void* workSpace, size_t wkspSize,   /*< `workSpace`=
 must be aligned on 4-bytes boundaries, `wkspSize` must be >=3D HUF_WORKSPA=
CE_SIZE */
-                       HUF_CElt* hufTable, HUF_repeat* repeat, int preferR=
epeat, int bmi2, unsigned suspectUncompressible);
-
-size_t HUF_decompress1X1 (void* dst, size_t dstSize, const void* cSrc, siz=
e_t cSrcSize);   /* single-symbol decoder */
-#ifndef HUF_FORCE_DECOMPRESS_X1
-size_t HUF_decompress1X2 (void* dst, size_t dstSize, const void* cSrc, siz=
e_t cSrcSize);   /* double-symbol decoder */
-#endif
-
-size_t HUF_decompress1X_DCtx (HUF_DTable* dctx, void* dst, size_t dstSize,=
 const void* cSrc, size_t cSrcSize);
-size_t HUF_decompress1X_DCtx_wksp (HUF_DTable* dctx, void* dst, size_t dst=
Size, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize);
-#ifndef HUF_FORCE_DECOMPRESS_X2
-size_t HUF_decompress1X1_DCtx(HUF_DTable* dctx, void* dst, size_t dstSize,=
 const void* cSrc, size_t cSrcSize);   /*< single-symbol decoder */
-size_t HUF_decompress1X1_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dst=
Size, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize);=
   /*< single-symbol decoder */
-#endif
-#ifndef HUF_FORCE_DECOMPRESS_X1
-size_t HUF_decompress1X2_DCtx(HUF_DTable* dctx, void* dst, size_t dstSize,=
 const void* cSrc, size_t cSrcSize);   /*< double-symbols decoder */
-size_t HUF_decompress1X2_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dst=
Size, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize);=
   /*< double-symbols decoder */
-#endif
+                       HUF_CElt* hufTable, HUF_repeat* repeat, int flags);
=20
-size_t HUF_decompress1X_usingDTable(void* dst, size_t maxDstSize, const vo=
id* cSrc, size_t cSrcSize, const HUF_DTable* DTable);   /*< automatic selec=
tion of sing or double symbol decoder, based on DTable */
-#ifndef HUF_FORCE_DECOMPRESS_X2
-size_t HUF_decompress1X1_usingDTable(void* dst, size_t maxDstSize, const v=
oid* cSrc, size_t cSrcSize, const HUF_DTable* DTable);
-#endif
+size_t HUF_decompress1X_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dstS=
ize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize, i=
nt flags);
 #ifndef HUF_FORCE_DECOMPRESS_X1
-size_t HUF_decompress1X2_usingDTable(void* dst, size_t maxDstSize, const v=
oid* cSrc, size_t cSrcSize, const HUF_DTable* DTable);
+size_t HUF_decompress1X2_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dst=
Size, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize, =
int flags);   /*< double-symbols decoder */
 #endif
=20
 /* BMI2 variants.
  * If the CPU has BMI2 support, pass bmi2=3D1, otherwise pass bmi2=3D0.
  */
-size_t HUF_decompress1X_usingDTable_bmi2(void* dst, size_t maxDstSize, con=
st void* cSrc, size_t cSrcSize, const HUF_DTable* DTable, int bmi2);
+size_t HUF_decompress1X_usingDTable(void* dst, size_t maxDstSize, const vo=
id* cSrc, size_t cSrcSize, const HUF_DTable* DTable, int flags);
 #ifndef HUF_FORCE_DECOMPRESS_X2
-size_t HUF_decompress1X1_DCtx_wksp_bmi2(HUF_DTable* dctx, void* dst, size_=
t dstSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspS=
ize, int bmi2);
+size_t HUF_decompress1X1_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dst=
Size, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize, =
int flags);
 #endif
-size_t HUF_decompress4X_usingDTable_bmi2(void* dst, size_t maxDstSize, con=
st void* cSrc, size_t cSrcSize, const HUF_DTable* DTable, int bmi2);
-size_t HUF_decompress4X_hufOnly_wksp_bmi2(HUF_DTable* dctx, void* dst, siz=
e_t dstSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wks=
pSize, int bmi2);
+size_t HUF_decompress4X_usingDTable(void* dst, size_t maxDstSize, const vo=
id* cSrc, size_t cSrcSize, const HUF_DTable* DTable, int flags);
+size_t HUF_decompress4X_hufOnly_wksp(HUF_DTable* dctx, void* dst, size_t d=
stSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize=
, int flags);
 #ifndef HUF_FORCE_DECOMPRESS_X2
-size_t HUF_readDTableX1_wksp_bmi2(HUF_DTable* DTable, const void* src, siz=
e_t srcSize, void* workSpace, size_t wkspSize, int bmi2);
+size_t HUF_readDTableX1_wksp(HUF_DTable* DTable, const void* src, size_t s=
rcSize, void* workSpace, size_t wkspSize, int flags);
 #endif
 #ifndef HUF_FORCE_DECOMPRESS_X1
-size_t HUF_readDTableX2_wksp_bmi2(HUF_DTable* DTable, const void* src, siz=
e_t srcSize, void* workSpace, size_t wkspSize, int bmi2);
+size_t HUF_readDTableX2_wksp(HUF_DTable* DTable, const void* src, size_t s=
rcSize, void* workSpace, size_t wkspSize, int flags);
 #endif
=20
-#endif /* HUF_STATIC_LINKING_ONLY */
-
+#endif   /* HUF_H_298734234 */
diff --git a/lib/zstd/common/mem.h b/lib/zstd/common/mem.h
index c22a2e69bf46..d9bd752fe17b 100644
--- a/lib/zstd/common/mem.h
+++ b/lib/zstd/common/mem.h
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -24,6 +24,7 @@
 /*-****************************************
 *  Compiler specifics
 ******************************************/
+#undef MEM_STATIC /* may be already defined from common/compiler.h */
 #define MEM_STATIC static inline
=20
 /*-**************************************************************
diff --git a/lib/zstd/common/portability_macros.h b/lib/zstd/common/portabi=
lity_macros.h
index 0e3b2c0a527d..05286af72683 100644
--- a/lib/zstd/common/portability_macros.h
+++ b/lib/zstd/common/portability_macros.h
@@ -1,5 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
 /*
- * Copyright (c) Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -12,7 +13,7 @@
 #define ZSTD_PORTABILITY_MACROS_H
=20
 /*
- * This header file contains macro defintions to support portability.
+ * This header file contains macro definitions to support portability.
  * This header is shared between C and ASM code, so it MUST only
  * contain macro definitions. It MUST not contain any C code.
  *
@@ -45,30 +46,35 @@
 /* Mark the internal assembly functions as hidden  */
 #ifdef __ELF__
 # define ZSTD_HIDE_ASM_FUNCTION(func) .hidden func
+#elif defined(__APPLE__)
+# define ZSTD_HIDE_ASM_FUNCTION(func) .private_extern func
 #else
 # define ZSTD_HIDE_ASM_FUNCTION(func)
 #endif
=20
+/* Compile time determination of BMI2 support */
+
+
 /* Enable runtime BMI2 dispatch based on the CPU.
  * Enabled for clang & gcc >=3D4.8 on x86 when BMI2 isn't enabled by defau=
lt.
  */
 #ifndef DYNAMIC_BMI2
-  #if ((defined(__clang__) && __has_attribute(__target__)) \
+#  if ((defined(__clang__) && __has_attribute(__target__)) \
       || (defined(__GNUC__) \
           && (__GNUC__ >=3D 5 || (__GNUC__ =3D=3D 4 && __GNUC_MINOR__ >=3D=
 8)))) \
-      && (defined(__x86_64__) || defined(_M_X64)) \
+      && (defined(__i386__) || defined(__x86_64__) || defined(_M_IX86) || =
defined(_M_X64)) \
       && !defined(__BMI2__)
-  #  define DYNAMIC_BMI2 1
-  #else
-  #  define DYNAMIC_BMI2 0
-  #endif
+#    define DYNAMIC_BMI2 1
+#  else
+#    define DYNAMIC_BMI2 0
+#  endif
 #endif
=20
 /*
- * Only enable assembly for GNUC comptabile compilers,
+ * Only enable assembly for GNU C compatible compilers,
  * because other platforms may not support GAS assembly syntax.
  *
- * Only enable assembly for Linux / MacOS, other platforms may
+ * Only enable assembly for Linux / MacOS / Win32, other platforms may
  * work, but they haven't been tested. This could likely be
  * extended to BSD systems.
  *
@@ -90,4 +96,23 @@
  */
 #define ZSTD_ENABLE_ASM_X86_64_BMI2 0
=20
+/*
+ * For x86 ELF targets, add .note.gnu.property section for Intel CET in
+ * assembly sources when CET is enabled.
+ *
+ * Additionally, any function that may be called indirectly must begin
+ * with ZSTD_CET_ENDBRANCH.
+ */
+#if defined(__ELF__) && (defined(__x86_64__) || defined(__i386__)) \
+    && defined(__has_include)
+# if __has_include(<cet.h>)
+#  include <cet.h>
+#  define ZSTD_CET_ENDBRANCH _CET_ENDBR
+# endif
+#endif
+
+#ifndef ZSTD_CET_ENDBRANCH
+# define ZSTD_CET_ENDBRANCH
+#endif
+
 #endif /* ZSTD_PORTABILITY_MACROS_H */
diff --git a/lib/zstd/common/zstd_common.c b/lib/zstd/common/zstd_common.c
index 3d7e35b309b5..44b95b25344a 100644
--- a/lib/zstd/common/zstd_common.c
+++ b/lib/zstd/common/zstd_common.c
@@ -1,5 +1,6 @@
+// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -14,7 +15,6 @@
 *  Dependencies
 ***************************************/
 #define ZSTD_DEPS_NEED_MALLOC
-#include "zstd_deps.h"   /* ZSTD_malloc, ZSTD_calloc, ZSTD_free, ZSTD_mems=
et */
 #include "error_private.h"
 #include "zstd_internal.h"
=20
@@ -47,37 +47,3 @@ ZSTD_ErrorCode ZSTD_getErrorCode(size_t code) { return E=
RR_getErrorCode(code); }
 /*! ZSTD_getErrorString() :
  *  provides error code string from enum */
 const char* ZSTD_getErrorString(ZSTD_ErrorCode code) { return ERR_getError=
String(code); }
-
-
-
-/*=3D**************************************************************
-*  Custom allocator
-****************************************************************/
-void* ZSTD_customMalloc(size_t size, ZSTD_customMem customMem)
-{
-    if (customMem.customAlloc)
-        return customMem.customAlloc(customMem.opaque, size);
-    return ZSTD_malloc(size);
-}
-
-void* ZSTD_customCalloc(size_t size, ZSTD_customMem customMem)
-{
-    if (customMem.customAlloc) {
-        /* calloc implemented as malloc+memset;
-         * not as efficient as calloc, but next best guess for custom mall=
oc */
-        void* const ptr =3D customMem.customAlloc(customMem.opaque, size);
-        ZSTD_memset(ptr, 0, size);
-        return ptr;
-    }
-    return ZSTD_calloc(1, size);
-}
-
-void ZSTD_customFree(void* ptr, ZSTD_customMem customMem)
-{
-    if (ptr!=3DNULL) {
-        if (customMem.customFree)
-            customMem.customFree(customMem.opaque, ptr);
-        else
-            ZSTD_free(ptr);
-    }
-}
diff --git a/lib/zstd/common/zstd_deps.h b/lib/zstd/common/zstd_deps.h
index 2c34e8a33a1c..f931f7d0e294 100644
--- a/lib/zstd/common/zstd_deps.h
+++ b/lib/zstd/common/zstd_deps.h
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
 /*
- * Copyright (c) Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -105,3 +105,17 @@ static uint64_t ZSTD_div64(uint64_t dividend, uint32_t=
 divisor) {
=20
 #endif /* ZSTD_DEPS_IO */
 #endif /* ZSTD_DEPS_NEED_IO */
+
+/*
+ * Only requested when MSAN is enabled.
+ * Need:
+ * intptr_t
+ */
+#ifdef ZSTD_DEPS_NEED_STDINT
+#ifndef ZSTD_DEPS_STDINT
+#define ZSTD_DEPS_STDINT
+
+/* intptr_t already provided by ZSTD_DEPS_COMMON */
+
+#endif /* ZSTD_DEPS_STDINT */
+#endif /* ZSTD_DEPS_NEED_STDINT */
diff --git a/lib/zstd/common/zstd_internal.h b/lib/zstd/common/zstd_interna=
l.h
index 93305d9b41bb..52a79435caf6 100644
--- a/lib/zstd/common/zstd_internal.h
+++ b/lib/zstd/common/zstd_internal.h
@@ -1,5 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -28,12 +29,10 @@
 #include <linux/zstd.h>
 #define FSE_STATIC_LINKING_ONLY
 #include "fse.h"
-#define HUF_STATIC_LINKING_ONLY
 #include "huf.h"
 #include <linux/xxhash.h>                /* XXH_reset, update, digest */
 #define ZSTD_TRACE 0
=20
-
 /* ---- static assert (debug) --- */
 #define ZSTD_STATIC_ASSERT(c) DEBUG_STATIC_ASSERT(c)
 #define ZSTD_isError ERR_isError   /* for inlining */
@@ -83,16 +82,17 @@ typedef enum { bt_raw, bt_rle, bt_compressed, bt_reserv=
ed } blockType_e;
 #define ZSTD_FRAMECHECKSUMSIZE 4
=20
 #define MIN_SEQUENCES_SIZE 1 /* nbSeq=3D=3D0 */
-#define MIN_CBLOCK_SIZE (1 /*litCSize*/ + 1 /* RLE or RAW */ + MIN_SEQUENC=
ES_SIZE /* nbSeq=3D=3D0 */)   /* for a non-null block */
+#define MIN_CBLOCK_SIZE (1 /*litCSize*/ + 1 /* RLE or RAW */)   /* for a n=
on-null block */
+#define MIN_LITERALS_FOR_4_STREAMS 6
=20
-#define HufLog 12
-typedef enum { set_basic, set_rle, set_compressed, set_repeat } symbolEnco=
dingType_e;
+typedef enum { set_basic, set_rle, set_compressed, set_repeat } SymbolEnco=
dingType_e;
=20
 #define LONGNBSEQ 0x7F00
=20
 #define MINMATCH 3
=20
 #define Litbits  8
+#define LitHufLog 11
 #define MaxLit ((1<<Litbits) - 1)
 #define MaxML   52
 #define MaxLL   35
@@ -103,6 +103,8 @@ typedef enum { set_basic, set_rle, set_compressed, set_=
repeat } symbolEncodingTy
 #define LLFSELog    9
 #define OffFSELog   8
 #define MaxFSELog  MAX(MAX(MLFSELog, LLFSELog), OffFSELog)
+#define MaxMLBits 16
+#define MaxLLBits 16
=20
 #define ZSTD_MAX_HUF_HEADER_SIZE 128 /* header + <=3D 127 byte tree descri=
ption */
 /* Each table cannot take more than #symbols * FSELog bits */
@@ -166,7 +168,7 @@ static void ZSTD_copy8(void* dst, const void* src) {
     ZSTD_memcpy(dst, src, 8);
 #endif
 }
-#define COPY8(d,s) { ZSTD_copy8(d,s); d+=3D8; s+=3D8; }
+#define COPY8(d,s) do { ZSTD_copy8(d,s); d+=3D8; s+=3D8; } while (0)
=20
 /* Need to use memmove here since the literal buffer can now be located wi=
thin
    the dst buffer. In circumstances where the op "catches up" to where the
@@ -186,7 +188,7 @@ static void ZSTD_copy16(void* dst, const void* src) {
     ZSTD_memcpy(dst, copy16_buf, 16);
 #endif
 }
-#define COPY16(d,s) { ZSTD_copy16(d,s); d+=3D16; s+=3D16; }
+#define COPY16(d,s) do { ZSTD_copy16(d,s); d+=3D16; s+=3D16; } while (0)
=20
 #define WILDCOPY_OVERLENGTH 32
 #define WILDCOPY_VECLEN 16
@@ -215,7 +217,7 @@ void ZSTD_wildcopy(void* dst, const void* src, ptrdiff_=
t length, ZSTD_overlap_e
     if (ovtype =3D=3D ZSTD_overlap_src_before_dst && diff < WILDCOPY_VECLE=
N) {
         /* Handle short offset copies. */
         do {
-            COPY8(op, ip)
+            COPY8(op, ip);
         } while (op < oend);
     } else {
         assert(diff >=3D WILDCOPY_VECLEN || diff <=3D -WILDCOPY_VECLEN);
@@ -225,12 +227,6 @@ void ZSTD_wildcopy(void* dst, const void* src, ptrdiff=
_t length, ZSTD_overlap_e
          * one COPY16() in the first call. Then, do two calls per loop sin=
ce
          * at that point it is more likely to have a high trip count.
          */
-#ifdef __aarch64__
-        do {
-            COPY16(op, ip);
-        }
-        while (op < oend);
-#else
         ZSTD_copy16(op, ip);
         if (16 >=3D length) return;
         op +=3D 16;
@@ -240,7 +236,6 @@ void ZSTD_wildcopy(void* dst, const void* src, ptrdiff_=
t length, ZSTD_overlap_e
             COPY16(op, ip);
         }
         while (op < oend);
-#endif
     }
 }
=20
@@ -273,62 +268,6 @@ typedef enum {
 /*-*******************************************
 *  Private declarations
 *********************************************/
-typedef struct seqDef_s {
-    U32 offBase;   /* offBase =3D=3D Offset + ZSTD_REP_NUM, or repcode 1,2=
,3 */
-    U16 litLength;
-    U16 mlBase;    /* mlBase =3D=3D matchLength - MINMATCH */
-} seqDef;
-
-/* Controls whether seqStore has a single "long" litLength or matchLength.=
 See seqStore_t. */
-typedef enum {
-    ZSTD_llt_none =3D 0,             /* no longLengthType */
-    ZSTD_llt_literalLength =3D 1,    /* represents a long literal */
-    ZSTD_llt_matchLength =3D 2       /* represents a long match */
-} ZSTD_longLengthType_e;
-
-typedef struct {
-    seqDef* sequencesStart;
-    seqDef* sequences;      /* ptr to end of sequences */
-    BYTE* litStart;
-    BYTE* lit;              /* ptr to end of literals */
-    BYTE* llCode;
-    BYTE* mlCode;
-    BYTE* ofCode;
-    size_t maxNbSeq;
-    size_t maxNbLit;
-
-    /* longLengthPos and longLengthType to allow us to represent either a =
single litLength or matchLength
-     * in the seqStore that has a value larger than U16 (if it exists). To=
 do so, we increment
-     * the existing value of the litLength or matchLength by 0x10000.
-     */
-    ZSTD_longLengthType_e   longLengthType;
-    U32                     longLengthPos;  /* Index of the sequence to ap=
ply long length modification to */
-} seqStore_t;
-
-typedef struct {
-    U32 litLength;
-    U32 matchLength;
-} ZSTD_sequenceLength;
-
-/*
- * Returns the ZSTD_sequenceLength for the given sequences. It handles the=
 decoding of long sequences
- * indicated by longLengthPos and longLengthType, and adds MINMATCH back t=
o matchLength.
- */
-MEM_STATIC ZSTD_sequenceLength ZSTD_getSequenceLength(seqStore_t const* se=
qStore, seqDef const* seq)
-{
-    ZSTD_sequenceLength seqLen;
-    seqLen.litLength =3D seq->litLength;
-    seqLen.matchLength =3D seq->mlBase + MINMATCH;
-    if (seqStore->longLengthPos =3D=3D (U32)(seq - seqStore->sequencesStar=
t)) {
-        if (seqStore->longLengthType =3D=3D ZSTD_llt_literalLength) {
-            seqLen.litLength +=3D 0xFFFF;
-        }
-        if (seqStore->longLengthType =3D=3D ZSTD_llt_matchLength) {
-            seqLen.matchLength +=3D 0xFFFF;
-        }
-    }
-    return seqLen;
-}
=20
 /*
  * Contains the compressed frame size and an upper-bound for the decompres=
sed frame size.
@@ -337,74 +276,11 @@ MEM_STATIC ZSTD_sequenceLength ZSTD_getSequenceLength=
(seqStore_t const* seqStore
  *          `decompressedBound !=3D ZSTD_CONTENTSIZE_ERROR`
  */
 typedef struct {
+    size_t nbBlocks;
     size_t compressedSize;
     unsigned long long decompressedBound;
 } ZSTD_frameSizeInfo;   /* decompress & legacy */
=20
-const seqStore_t* ZSTD_getSeqStore(const ZSTD_CCtx* ctx);   /* compress & =
dictBuilder */
-void ZSTD_seqToCodes(const seqStore_t* seqStorePtr);   /* compress, dictBu=
ilder, decodeCorpus (shouldn't get its definition from here) */
-
-/* custom memory allocation functions */
-void* ZSTD_customMalloc(size_t size, ZSTD_customMem customMem);
-void* ZSTD_customCalloc(size_t size, ZSTD_customMem customMem);
-void ZSTD_customFree(void* ptr, ZSTD_customMem customMem);
-
-
-MEM_STATIC U32 ZSTD_highbit32(U32 val)   /* compress, dictBuilder, decodeC=
orpus */
-{
-    assert(val !=3D 0);
-    {
-#   if (__GNUC__ >=3D 3)   /* GCC Intrinsic */
-        return __builtin_clz (val) ^ 31;
-#   else   /* Software version */
-        static const U32 DeBruijnClz[32] =3D { 0, 9, 1, 10, 13, 21, 2, 29,=
 11, 14, 16, 18, 22, 25, 3, 30, 8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6=
, 26, 5, 4, 31 };
-        U32 v =3D val;
-        v |=3D v >> 1;
-        v |=3D v >> 2;
-        v |=3D v >> 4;
-        v |=3D v >> 8;
-        v |=3D v >> 16;
-        return DeBruijnClz[(v * 0x07C4ACDDU) >> 27];
-#   endif
-    }
-}
-
-/*
- * Counts the number of trailing zeros of a `size_t`.
- * Most compilers should support CTZ as a builtin. A backup
- * implementation is provided if the builtin isn't supported, but
- * it may not be terribly efficient.
- */
-MEM_STATIC unsigned ZSTD_countTrailingZeros(size_t val)
-{
-    if (MEM_64bits()) {
-#       if (__GNUC__ >=3D 4)
-            return __builtin_ctzll((U64)val);
-#       else
-            static const int DeBruijnBytePos[64] =3D {  0,  1,  2,  7,  3,=
 13,  8, 19,
-                                                      4, 25, 14, 28,  9, 3=
4, 20, 56,
-                                                      5, 17, 26, 54, 15, 4=
1, 29, 43,
-                                                      10, 31, 38, 35, 21, =
45, 49, 57,
-                                                      63,  6, 12, 18, 24, =
27, 33, 55,
-                                                      16, 53, 40, 42, 30, =
37, 44, 48,
-                                                      62, 11, 23, 32, 52, =
39, 36, 47,
-                                                      61, 22, 51, 46, 60, =
50, 59, 58 };
-            return DeBruijnBytePos[((U64)((val & -(long long)val) * 0x0218=
A392CDABBD3FULL)) >> 58];
-#       endif
-    } else { /* 32 bits */
-#       if (__GNUC__ >=3D 3)
-            return __builtin_ctz((U32)val);
-#       else
-            static const int DeBruijnBytePos[32] =3D {  0,  1, 28,  2, 29,=
 14, 24,  3,
-                                                     30, 22, 20, 15, 25, 1=
7,  4,  8,
-                                                     31, 27, 13, 23, 21, 1=
9, 16,  7,
-                                                     26, 12, 18,  6, 11,  =
5, 10,  9 };
-            return DeBruijnBytePos[((U32)((val & -(S32)val) * 0x077CB531U)=
) >> 27];
-#       endif
-    }
-}
-
-
 /* ZSTD_invalidateRepCodes() :
  * ensures next compression will not use repcodes from previous block.
  * Note : only works with regular variant;
@@ -420,13 +296,13 @@ typedef struct {
=20
 /*! ZSTD_getcBlockSize() :
  *  Provides the size of compressed block from block header `src` */
-/* Used by: decompress, fullbench (does not get its definition from here) =
*/
+/*  Used by: decompress, fullbench */
 size_t ZSTD_getcBlockSize(const void* src, size_t srcSize,
                           blockProperties_t* bpPtr);
=20
 /*! ZSTD_decodeSeqHeaders() :
  *  decode sequence header from src */
-/* Used by: decompress, fullbench (does not get its definition from here) =
*/
+/*  Used by: zstd_decompress_block, fullbench */
 size_t ZSTD_decodeSeqHeaders(ZSTD_DCtx* dctx, int* nbSeqPtr,
                        const void* src, size_t srcSize);
=20
@@ -439,5 +315,4 @@ MEM_STATIC int ZSTD_cpuSupportsBmi2(void)
     return ZSTD_cpuid_bmi1(cpuid) && ZSTD_cpuid_bmi2(cpuid);
 }
=20
-
 #endif   /* ZSTD_CCOMMON_H_MODULE */
diff --git a/lib/zstd/compress/clevels.h b/lib/zstd/compress/clevels.h
index d9a76112ec3a..6ab8be6532ef 100644
--- a/lib/zstd/compress/clevels.h
+++ b/lib/zstd/compress/clevels.h
@@ -1,5 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
diff --git a/lib/zstd/compress/fse_compress.c b/lib/zstd/compress/fse_compr=
ess.c
index ec5b1ca6d71a..44a3c10becf2 100644
--- a/lib/zstd/compress/fse_compress.c
+++ b/lib/zstd/compress/fse_compress.c
@@ -1,6 +1,7 @@
+// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
 /* ******************************************************************
  * FSE : Finite State Entropy encoder
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  *
  *  You can contact the author at :
  *  - FSE source repository : https://github.com/Cyan4973/FiniteStateEntro=
py
@@ -25,7 +26,8 @@
 #include "../common/error_private.h"
 #define ZSTD_DEPS_NEED_MALLOC
 #define ZSTD_DEPS_NEED_MATH64
-#include "../common/zstd_deps.h"  /* ZSTD_malloc, ZSTD_free, ZSTD_memcpy, =
ZSTD_memset */
+#include "../common/zstd_deps.h"  /* ZSTD_memset */
+#include "../common/bits.h" /* ZSTD_highbit32 */
=20
=20
 /* **************************************************************
@@ -90,7 +92,7 @@ size_t FSE_buildCTable_wksp(FSE_CTable* ct,
     assert(tableLog < 16);   /* required for threshold strategy to work */
=20
     /* For explanations on how to distribute symbol values over the table :
-     * http://fastcompression.blogspot.fr/2014/02/fse-distributing-symbol-=
values.html */
+     * https://fastcompression.blogspot.fr/2014/02/fse-distributing-symbol=
-values.html */
=20
      #ifdef __clang_analyzer__
      ZSTD_memset(tableSymbol, 0, sizeof(*tableSymbol) * tableSize);   /* u=
seless initialization, just to keep scan-build happy */
@@ -191,7 +193,7 @@ size_t FSE_buildCTable_wksp(FSE_CTable* ct,
                 break;
             default :
                 assert(normalizedCounter[s] > 1);
-                {   U32 const maxBitsOut =3D tableLog - BIT_highbit32 ((U3=
2)normalizedCounter[s]-1);
+                {   U32 const maxBitsOut =3D tableLog - ZSTD_highbit32 ((U=
32)normalizedCounter[s]-1);
                     U32 const minStatePlus =3D (U32)normalizedCounter[s] <=
< maxBitsOut;
                     symbolTT[s].deltaNbBits =3D (maxBitsOut << 16) - minSt=
atePlus;
                     symbolTT[s].deltaFindState =3D (int)(total - (unsigned=
)normalizedCounter[s]);
@@ -224,8 +226,8 @@ size_t FSE_NCountWriteBound(unsigned maxSymbolValue, un=
signed tableLog)
     size_t const maxHeaderSize =3D (((maxSymbolValue+1) * tableLog
                                    + 4 /* bitCount initialized at 4 */
                                    + 2 /* first two symbols may use one ad=
ditional bit each */) / 8)
-                                    + 1 /* round up to whole nb bytes */
-                                    + 2 /* additional two bytes for bitstr=
eam flush */;
+                                   + 1 /* round up to whole nb bytes */
+                                   + 2 /* additional two bytes for bitstre=
am flush */;
     return maxSymbolValue ? maxHeaderSize : FSE_NCOUNTBOUND;  /* maxSymbol=
Value=3D=3D0 ? use default */
 }
=20
@@ -254,7 +256,7 @@ FSE_writeNCount_generic (void* header, size_t headerBuf=
ferSize,
     /* Init */
     remaining =3D tableSize+1;   /* +1 for extra accuracy */
     threshold =3D tableSize;
-    nbBits =3D tableLog+1;
+    nbBits =3D (int)tableLog+1;
=20
     while ((symbol < alphabetSize) && (remaining>1)) {  /* stops at 1 */
         if (previousIs0) {
@@ -273,7 +275,7 @@ FSE_writeNCount_generic (void* header, size_t headerBuf=
ferSize,
             }
             while (symbol >=3D start+3) {
                 start+=3D3;
-                bitStream +=3D 3 << bitCount;
+                bitStream +=3D 3U << bitCount;
                 bitCount +=3D 2;
             }
             bitStream +=3D (symbol-start) << bitCount;
@@ -293,7 +295,7 @@ FSE_writeNCount_generic (void* header, size_t headerBuf=
ferSize,
             count++;   /* +1 for extra accuracy */
             if (count>=3Dthreshold)
                 count +=3D max;   /* [0..max[ [max..threshold[ (...) [thre=
shold+max 2*threshold[ */
-            bitStream +=3D count << bitCount;
+            bitStream +=3D (U32)count << bitCount;
             bitCount  +=3D nbBits;
             bitCount  -=3D (count<max);
             previousIs0  =3D (count=3D=3D1);
@@ -321,7 +323,8 @@ FSE_writeNCount_generic (void* header, size_t headerBuf=
ferSize,
     out[1] =3D (BYTE)(bitStream>>8);
     out+=3D (bitCount+7) /8;
=20
-    return (out-ostart);
+    assert(out >=3D ostart);
+    return (size_t)(out-ostart);
 }
=20
=20
@@ -342,21 +345,11 @@ size_t FSE_writeNCount (void* buffer, size_t bufferSi=
ze,
 *  FSE Compression Code
 ****************************************************************/
=20
-FSE_CTable* FSE_createCTable (unsigned maxSymbolValue, unsigned tableLog)
-{
-    size_t size;
-    if (tableLog > FSE_TABLELOG_ABSOLUTE_MAX) tableLog =3D FSE_TABLELOG_AB=
SOLUTE_MAX;
-    size =3D FSE_CTABLE_SIZE_U32 (tableLog, maxSymbolValue) * sizeof(U32);
-    return (FSE_CTable*)ZSTD_malloc(size);
-}
-
-void FSE_freeCTable (FSE_CTable* ct) { ZSTD_free(ct); }
-
 /* provides the minimum logSize to safely represent a distribution */
 static unsigned FSE_minTableLog(size_t srcSize, unsigned maxSymbolValue)
 {
-    U32 minBitsSrc =3D BIT_highbit32((U32)(srcSize)) + 1;
-    U32 minBitsSymbols =3D BIT_highbit32(maxSymbolValue) + 2;
+    U32 minBitsSrc =3D ZSTD_highbit32((U32)(srcSize)) + 1;
+    U32 minBitsSymbols =3D ZSTD_highbit32(maxSymbolValue) + 2;
     U32 minBits =3D minBitsSrc < minBitsSymbols ? minBitsSrc : minBitsSymb=
ols;
     assert(srcSize > 1); /* Not supported, RLE should be used instead */
     return minBits;
@@ -364,7 +357,7 @@ static unsigned FSE_minTableLog(size_t srcSize, unsigne=
d maxSymbolValue)
=20
 unsigned FSE_optimalTableLog_internal(unsigned maxTableLog, size_t srcSize=
, unsigned maxSymbolValue, unsigned minus)
 {
-    U32 maxBitsSrc =3D BIT_highbit32((U32)(srcSize - 1)) - minus;
+    U32 maxBitsSrc =3D ZSTD_highbit32((U32)(srcSize - 1)) - minus;
     U32 tableLog =3D maxTableLog;
     U32 minBits =3D FSE_minTableLog(srcSize, maxSymbolValue);
     assert(srcSize > 1); /* Not supported, RLE should be used instead */
@@ -532,40 +525,6 @@ size_t FSE_normalizeCount (short* normalizedCounter, u=
nsigned tableLog,
     return tableLog;
 }
=20
-
-/* fake FSE_CTable, for raw (uncompressed) input */
-size_t FSE_buildCTable_raw (FSE_CTable* ct, unsigned nbBits)
-{
-    const unsigned tableSize =3D 1 << nbBits;
-    const unsigned tableMask =3D tableSize - 1;
-    const unsigned maxSymbolValue =3D tableMask;
-    void* const ptr =3D ct;
-    U16* const tableU16 =3D ( (U16*) ptr) + 2;
-    void* const FSCT =3D ((U32*)ptr) + 1 /* header */ + (tableSize>>1);   =
/* assumption : tableLog >=3D 1 */
-    FSE_symbolCompressionTransform* const symbolTT =3D (FSE_symbolCompress=
ionTransform*) (FSCT);
-    unsigned s;
-
-    /* Sanity checks */
-    if (nbBits < 1) return ERROR(GENERIC);             /* min size */
-
-    /* header */
-    tableU16[-2] =3D (U16) nbBits;
-    tableU16[-1] =3D (U16) maxSymbolValue;
-
-    /* Build table */
-    for (s=3D0; s<tableSize; s++)
-        tableU16[s] =3D (U16)(tableSize + s);
-
-    /* Build Symbol Transformation Table */
-    {   const U32 deltaNbBits =3D (nbBits << 16) - (1 << nbBits);
-        for (s=3D0; s<=3DmaxSymbolValue; s++) {
-            symbolTT[s].deltaNbBits =3D deltaNbBits;
-            symbolTT[s].deltaFindState =3D s-1;
-    }   }
-
-    return 0;
-}
-
 /* fake FSE_CTable, for rle input (always same symbol) */
 size_t FSE_buildCTable_rle (FSE_CTable* ct, BYTE symbolValue)
 {
@@ -664,5 +623,4 @@ size_t FSE_compress_usingCTable (void* dst, size_t dstS=
ize,
=20
 size_t FSE_compressBound(size_t size) { return FSE_COMPRESSBOUND(size); }
=20
-
 #endif   /* FSE_COMMONDEFS_ONLY */
diff --git a/lib/zstd/compress/hist.c b/lib/zstd/compress/hist.c
index 3ddc6dfb6894..87145a2d9160 100644
--- a/lib/zstd/compress/hist.c
+++ b/lib/zstd/compress/hist.c
@@ -1,7 +1,8 @@
+// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
 /* ******************************************************************
  * hist : Histogram functions
  * part of Finite State Entropy project
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  *
  *  You can contact the author at :
  *  - FSE source repository : https://github.com/Cyan4973/FiniteStateEntro=
py
@@ -26,6 +27,16 @@ unsigned HIST_isError(size_t code) { return ERR_isError(=
code); }
 /*-**************************************************************
  *  Histogram functions
  ****************************************************************/
+void HIST_add(unsigned* count, const void* src, size_t srcSize)
+{
+    const BYTE* ip =3D (const BYTE*)src;
+    const BYTE* const end =3D ip + srcSize;
+
+    while (ip<end) {
+        count[*ip++]++;
+    }
+}
+
 unsigned HIST_count_simple(unsigned* count, unsigned* maxSymbolValuePtr,
                            const void* src, size_t srcSize)
 {
diff --git a/lib/zstd/compress/hist.h b/lib/zstd/compress/hist.h
index fc1830abc9c6..e5d57d79e4d5 100644
--- a/lib/zstd/compress/hist.h
+++ b/lib/zstd/compress/hist.h
@@ -1,7 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
 /* ******************************************************************
  * hist : Histogram functions
  * part of Finite State Entropy project
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  *
  *  You can contact the author at :
  *  - FSE source repository : https://github.com/Cyan4973/FiniteStateEntro=
py
@@ -73,3 +74,10 @@ size_t HIST_countFast_wksp(unsigned* count, unsigned* ma=
xSymbolValuePtr,
  */
 unsigned HIST_count_simple(unsigned* count, unsigned* maxSymbolValuePtr,
                            const void* src, size_t srcSize);
+
+/*! HIST_add() :
+ *  Lowest level: just add nb of occurrences of characters from @src into =
@count.
+ *  @count is not reset. @count array is presumed large enough (i.e. 1 KB).
+ @  This function does not need any additional stack memory.
+ */
+void HIST_add(unsigned* count, const void* src, size_t srcSize);
diff --git a/lib/zstd/compress/huf_compress.c b/lib/zstd/compress/huf_compr=
ess.c
index 74ef0db47621..0b229f5d2ae2 100644
--- a/lib/zstd/compress/huf_compress.c
+++ b/lib/zstd/compress/huf_compress.c
@@ -1,6 +1,7 @@
+// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
 /* ******************************************************************
  * Huffman encoder, part of New Generation Entropy library
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  *
  *  You can contact the author at :
  *  - FSE+HUF source repository : https://github.com/Cyan4973/FiniteStateE=
ntropy
@@ -26,9 +27,9 @@
 #include "hist.h"
 #define FSE_STATIC_LINKING_ONLY   /* FSE_optimalTableLog_internal */
 #include "../common/fse.h"        /* header compression */
-#define HUF_STATIC_LINKING_ONLY
 #include "../common/huf.h"
 #include "../common/error_private.h"
+#include "../common/bits.h"       /* ZSTD_highbit32 */
=20
=20
 /* **************************************************************
@@ -39,13 +40,67 @@
=20
=20
 /* **************************************************************
-*  Utils
+*  Required declarations
 ****************************************************************/
-unsigned HUF_optimalTableLog(unsigned maxTableLog, size_t srcSize, unsigne=
d maxSymbolValue)
+typedef struct nodeElt_s {
+    U32 count;
+    U16 parent;
+    BYTE byte;
+    BYTE nbBits;
+} nodeElt;
+
+
+/* **************************************************************
+*  Debug Traces
+****************************************************************/
+
+#if DEBUGLEVEL >=3D 2
+
+static size_t showU32(const U32* arr, size_t size)
 {
-    return FSE_optimalTableLog_internal(maxTableLog, srcSize, maxSymbolVal=
ue, 1);
+    size_t u;
+    for (u=3D0; u<size; u++) {
+        RAWLOG(6, " %u", arr[u]); (void)arr;
+    }
+    RAWLOG(6, " \n");
+    return size;
 }
=20
+static size_t HUF_getNbBits(HUF_CElt elt);
+
+static size_t showCTableBits(const HUF_CElt* ctable, size_t size)
+{
+    size_t u;
+    for (u=3D0; u<size; u++) {
+        RAWLOG(6, " %zu", HUF_getNbBits(ctable[u])); (void)ctable;
+    }
+    RAWLOG(6, " \n");
+    return size;
+
+}
+
+static size_t showHNodeSymbols(const nodeElt* hnode, size_t size)
+{
+    size_t u;
+    for (u=3D0; u<size; u++) {
+        RAWLOG(6, " %u", hnode[u].byte); (void)hnode;
+    }
+    RAWLOG(6, " \n");
+    return size;
+}
+
+static size_t showHNodeBits(const nodeElt* hnode, size_t size)
+{
+    size_t u;
+    for (u=3D0; u<size; u++) {
+        RAWLOG(6, " %u", hnode[u].nbBits); (void)hnode;
+    }
+    RAWLOG(6, " \n");
+    return size;
+}
+
+#endif
+
=20
 /* *******************************************************
 *  HUF : Huffman block compression
@@ -86,7 +141,10 @@ typedef struct {
     S16 norm[HUF_TABLELOG_MAX+1];
 } HUF_CompressWeightsWksp;
=20
-static size_t HUF_compressWeights(void* dst, size_t dstSize, const void* w=
eightTable, size_t wtSize, void* workspace, size_t workspaceSize)
+static size_t
+HUF_compressWeights(void* dst, size_t dstSize,
+              const void* weightTable, size_t wtSize,
+                    void* workspace, size_t workspaceSize)
 {
     BYTE* const ostart =3D (BYTE*) dst;
     BYTE* op =3D ostart;
@@ -137,7 +195,7 @@ static size_t HUF_getNbBitsFast(HUF_CElt elt)
=20
 static size_t HUF_getValue(HUF_CElt elt)
 {
-    return elt & ~0xFF;
+    return elt & ~(size_t)0xFF;
 }
=20
 static size_t HUF_getValueFast(HUF_CElt elt)
@@ -160,6 +218,25 @@ static void HUF_setValue(HUF_CElt* elt, size_t value)
     }
 }
=20
+HUF_CTableHeader HUF_readCTableHeader(HUF_CElt const* ctable)
+{
+    HUF_CTableHeader header;
+    ZSTD_memcpy(&header, ctable, sizeof(header));
+    return header;
+}
+
+static void HUF_writeCTableHeader(HUF_CElt* ctable, U32 tableLog, U32 maxS=
ymbolValue)
+{
+    HUF_CTableHeader header;
+    HUF_STATIC_ASSERT(sizeof(ctable[0]) =3D=3D sizeof(header));
+    ZSTD_memset(&header, 0, sizeof(header));
+    assert(tableLog < 256);
+    header.tableLog =3D (BYTE)tableLog;
+    assert(maxSymbolValue < 256);
+    header.maxSymbolValue =3D (BYTE)maxSymbolValue;
+    ZSTD_memcpy(ctable, &header, sizeof(header));
+}
+
 typedef struct {
     HUF_CompressWeightsWksp wksp;
     BYTE bitsToWeight[HUF_TABLELOG_MAX + 1];   /* precomputed conversion t=
able */
@@ -175,6 +252,11 @@ size_t HUF_writeCTable_wksp(void* dst, size_t maxDstSi=
ze,
     U32 n;
     HUF_WriteCTableWksp* wksp =3D (HUF_WriteCTableWksp*)HUF_alignUpWorkspa=
ce(workspace, &workspaceSize, ZSTD_ALIGNOF(U32));
=20
+    HUF_STATIC_ASSERT(HUF_CTABLE_WORKSPACE_SIZE >=3D sizeof(HUF_WriteCTabl=
eWksp));
+
+    assert(HUF_readCTableHeader(CTable).maxSymbolValue =3D=3D maxSymbolVal=
ue);
+    assert(HUF_readCTableHeader(CTable).tableLog =3D=3D huffLog);
+
     /* check conditions */
     if (workspaceSize < sizeof(HUF_WriteCTableWksp)) return ERROR(GENERIC);
     if (maxSymbolValue > HUF_SYMBOLVALUE_MAX) return ERROR(maxSymbolValue_=
tooLarge);
@@ -204,16 +286,6 @@ size_t HUF_writeCTable_wksp(void* dst, size_t maxDstSi=
ze,
     return ((maxSymbolValue+1)/2) + 1;
 }
=20
-/*! HUF_writeCTable() :
-    `CTable` : Huffman tree to save, using huf representation.
-    @return : size of saved CTable */
-size_t HUF_writeCTable (void* dst, size_t maxDstSize,
-                        const HUF_CElt* CTable, unsigned maxSymbolValue, u=
nsigned huffLog)
-{
-    HUF_WriteCTableWksp wksp;
-    return HUF_writeCTable_wksp(dst, maxDstSize, CTable, maxSymbolValue, h=
uffLog, &wksp, sizeof(wksp));
-}
-
=20
 size_t HUF_readCTable (HUF_CElt* CTable, unsigned* maxSymbolValuePtr, cons=
t void* src, size_t srcSize, unsigned* hasZeroWeights)
 {
@@ -231,7 +303,9 @@ size_t HUF_readCTable (HUF_CElt* CTable, unsigned* maxS=
ymbolValuePtr, const void
     if (tableLog > HUF_TABLELOG_MAX) return ERROR(tableLog_tooLarge);
     if (nbSymbols > *maxSymbolValuePtr+1) return ERROR(maxSymbolValue_tooS=
mall);
=20
-    CTable[0] =3D tableLog;
+    *maxSymbolValuePtr =3D nbSymbols - 1;
+
+    HUF_writeCTableHeader(CTable, tableLog, *maxSymbolValuePtr);
=20
     /* Prepare base value per rank */
     {   U32 n, nextRankStart =3D 0;
@@ -263,74 +337,71 @@ size_t HUF_readCTable (HUF_CElt* CTable, unsigned* ma=
xSymbolValuePtr, const void
         { U32 n; for (n=3D0; n<nbSymbols; n++) HUF_setValue(ct + n, valPer=
Rank[HUF_getNbBits(ct[n])]++); }
     }
=20
-    *maxSymbolValuePtr =3D nbSymbols - 1;
     return readSize;
 }
=20
 U32 HUF_getNbBitsFromCTable(HUF_CElt const* CTable, U32 symbolValue)
 {
-    const HUF_CElt* ct =3D CTable + 1;
+    const HUF_CElt* const ct =3D CTable + 1;
     assert(symbolValue <=3D HUF_SYMBOLVALUE_MAX);
+    if (symbolValue > HUF_readCTableHeader(CTable).maxSymbolValue)
+        return 0;
     return (U32)HUF_getNbBits(ct[symbolValue]);
 }
=20
=20
-typedef struct nodeElt_s {
-    U32 count;
-    U16 parent;
-    BYTE byte;
-    BYTE nbBits;
-} nodeElt;
-
 /*
  * HUF_setMaxHeight():
- * Enforces maxNbBits on the Huffman tree described in huffNode.
+ * Try to enforce @targetNbBits on the Huffman tree described in @huffNode.
  *
- * It sets all nodes with nbBits > maxNbBits to be maxNbBits. Then it adju=
sts
- * the tree to so that it is a valid canonical Huffman tree.
+ * It attempts to convert all nodes with nbBits > @targetNbBits
+ * to employ @targetNbBits instead. Then it adjusts the tree
+ * so that it remains a valid canonical Huffman tree.
  *
  * @pre               The sum of the ranks of each symbol =3D=3D 2^largest=
Bits,
  *                    where largestBits =3D=3D huffNode[lastNonNull].nbBit=
s.
  * @post              The sum of the ranks of each symbol =3D=3D 2^largest=
Bits,
- *                    where largestBits is the return value <=3D maxNbBits.
+ *                    where largestBits is the return value (expected <=3D=
 targetNbBits).
  *
- * @param huffNode    The Huffman tree modified in place to enforce maxNbB=
its.
+ * @param huffNode    The Huffman tree modified in place to enforce target=
NbBits.
+ *                    It's presumed sorted, from most frequent to rarest s=
ymbol.
  * @param lastNonNull The symbol with the lowest count in the Huffman tree.
- * @param maxNbBits   The maximum allowed number of bits, which the Huffma=
n tree
+ * @param targetNbBits  The allowed number of bits, which the Huffman tree
  *                    may not respect. After this function the Huffman tre=
e will
- *                    respect maxNbBits.
- * @return            The maximum number of bits of the Huffman tree after=
 adjustment,
- *                    necessarily no more than maxNbBits.
+ *                    respect targetNbBits.
+ * @return            The maximum number of bits of the Huffman tree after=
 adjustment.
  */
-static U32 HUF_setMaxHeight(nodeElt* huffNode, U32 lastNonNull, U32 maxNbB=
its)
+static U32 HUF_setMaxHeight(nodeElt* huffNode, U32 lastNonNull, U32 target=
NbBits)
 {
     const U32 largestBits =3D huffNode[lastNonNull].nbBits;
-    /* early exit : no elt > maxNbBits, so the tree is already valid. */
-    if (largestBits <=3D maxNbBits) return largestBits;
+    /* early exit : no elt > targetNbBits, so the tree is already valid. */
+    if (largestBits <=3D targetNbBits) return largestBits;
+
+    DEBUGLOG(5, "HUF_setMaxHeight (targetNbBits =3D %u)", targetNbBits);
=20
     /* there are several too large elements (at least >=3D 2) */
     {   int totalCost =3D 0;
-        const U32 baseCost =3D 1 << (largestBits - maxNbBits);
+        const U32 baseCost =3D 1 << (largestBits - targetNbBits);
         int n =3D (int)lastNonNull;
=20
-        /* Adjust any ranks > maxNbBits to maxNbBits.
+        /* Adjust any ranks > targetNbBits to targetNbBits.
          * Compute totalCost, which is how far the sum of the ranks is
          * we are over 2^largestBits after adjust the offending ranks.
          */
-        while (huffNode[n].nbBits > maxNbBits) {
+        while (huffNode[n].nbBits > targetNbBits) {
             totalCost +=3D baseCost - (1 << (largestBits - huffNode[n].nbB=
its));
-            huffNode[n].nbBits =3D (BYTE)maxNbBits;
+            huffNode[n].nbBits =3D (BYTE)targetNbBits;
             n--;
         }
-        /* n stops at huffNode[n].nbBits <=3D maxNbBits */
-        assert(huffNode[n].nbBits <=3D maxNbBits);
-        /* n end at index of smallest symbol using < maxNbBits */
-        while (huffNode[n].nbBits =3D=3D maxNbBits) --n;
+        /* n stops at huffNode[n].nbBits <=3D targetNbBits */
+        assert(huffNode[n].nbBits <=3D targetNbBits);
+        /* n end at index of smallest symbol using < targetNbBits */
+        while (huffNode[n].nbBits =3D=3D targetNbBits) --n;
=20
-        /* renorm totalCost from 2^largestBits to 2^maxNbBits
+        /* renorm totalCost from 2^largestBits to 2^targetNbBits
          * note : totalCost is necessarily a multiple of baseCost */
-        assert((totalCost & (baseCost - 1)) =3D=3D 0);
-        totalCost >>=3D (largestBits - maxNbBits);
+        assert(((U32)totalCost & (baseCost - 1)) =3D=3D 0);
+        totalCost >>=3D (largestBits - targetNbBits);
         assert(totalCost > 0);
=20
         /* repay normalized cost */
@@ -339,19 +410,19 @@ static U32 HUF_setMaxHeight(nodeElt* huffNode, U32 la=
stNonNull, U32 maxNbBits)
=20
             /* Get pos of last (smallest =3D lowest cum. count) symbol per=
 rank */
             ZSTD_memset(rankLast, 0xF0, sizeof(rankLast));
-            {   U32 currentNbBits =3D maxNbBits;
+            {   U32 currentNbBits =3D targetNbBits;
                 int pos;
                 for (pos=3Dn ; pos >=3D 0; pos--) {
                     if (huffNode[pos].nbBits >=3D currentNbBits) continue;
-                    currentNbBits =3D huffNode[pos].nbBits;   /* < maxNbBi=
ts */
-                    rankLast[maxNbBits-currentNbBits] =3D (U32)pos;
+                    currentNbBits =3D huffNode[pos].nbBits;   /* < targetN=
bBits */
+                    rankLast[targetNbBits-currentNbBits] =3D (U32)pos;
             }   }
=20
             while (totalCost > 0) {
                 /* Try to reduce the next power of 2 above totalCost becau=
se we
                  * gain back half the rank.
                  */
-                U32 nBitsToDecrease =3D BIT_highbit32((U32)totalCost) + 1;
+                U32 nBitsToDecrease =3D ZSTD_highbit32((U32)totalCost) + 1;
                 for ( ; nBitsToDecrease > 1; nBitsToDecrease--) {
                     U32 const highPos =3D rankLast[nBitsToDecrease];
                     U32 const lowPos =3D rankLast[nBitsToDecrease-1];
@@ -391,7 +462,7 @@ static U32 HUF_setMaxHeight(nodeElt* huffNode, U32 last=
NonNull, U32 maxNbBits)
                     rankLast[nBitsToDecrease] =3D noSymbol;
                 else {
                     rankLast[nBitsToDecrease]--;
-                    if (huffNode[rankLast[nBitsToDecrease]].nbBits !=3D ma=
xNbBits-nBitsToDecrease)
+                    if (huffNode[rankLast[nBitsToDecrease]].nbBits !=3D ta=
rgetNbBits-nBitsToDecrease)
                         rankLast[nBitsToDecrease] =3D noSymbol;   /* this =
rank is now empty */
                 }
             }   /* while (totalCost > 0) */
@@ -403,11 +474,11 @@ static U32 HUF_setMaxHeight(nodeElt* huffNode, U32 la=
stNonNull, U32 maxNbBits)
              * TODO.
              */
             while (totalCost < 0) {  /* Sometimes, cost correction oversho=
ot */
-                /* special case : no rank 1 symbol (using maxNbBits-1);
-                 * let's create one from largest rank 0 (using maxNbBits).
+                /* special case : no rank 1 symbol (using targetNbBits-1);
+                 * let's create one from largest rank 0 (using targetNbBit=
s).
                  */
                 if (rankLast[1] =3D=3D noSymbol) {
-                    while (huffNode[n].nbBits =3D=3D maxNbBits) n--;
+                    while (huffNode[n].nbBits =3D=3D targetNbBits) n--;
                     huffNode[n+1].nbBits--;
                     assert(n >=3D 0);
                     rankLast[1] =3D (U32)(n+1);
@@ -421,7 +492,7 @@ static U32 HUF_setMaxHeight(nodeElt* huffNode, U32 last=
NonNull, U32 maxNbBits)
         }   /* repay normalized cost */
     }   /* there are several too large elements (at least >=3D 2) */
=20
-    return maxNbBits;
+    return targetNbBits;
 }
=20
 typedef struct {
@@ -429,7 +500,7 @@ typedef struct {
     U16 curr;
 } rankPos;
=20
-typedef nodeElt huffNodeTable[HUF_CTABLE_WORKSPACE_SIZE_U32];
+typedef nodeElt huffNodeTable[2 * (HUF_SYMBOLVALUE_MAX + 1)];
=20
 /* Number of buckets available for HUF_sort() */
 #define RANK_POSITION_TABLE_SIZE 192
@@ -448,8 +519,8 @@ typedef struct {
  * Let buckets 166 to 192 represent all remaining counts up to RANK_POSITI=
ON_MAX_COUNT_LOG using log2 bucketing.
  */
 #define RANK_POSITION_MAX_COUNT_LOG 32
-#define RANK_POSITION_LOG_BUCKETS_BEGIN (RANK_POSITION_TABLE_SIZE - 1) - R=
ANK_POSITION_MAX_COUNT_LOG - 1 /* =3D=3D 158 */
-#define RANK_POSITION_DISTINCT_COUNT_CUTOFF RANK_POSITION_LOG_BUCKETS_BEGI=
N + BIT_highbit32(RANK_POSITION_LOG_BUCKETS_BEGIN) /* =3D=3D 166 */
+#define RANK_POSITION_LOG_BUCKETS_BEGIN ((RANK_POSITION_TABLE_SIZE - 1) - =
RANK_POSITION_MAX_COUNT_LOG - 1 /* =3D=3D 158 */)
+#define RANK_POSITION_DISTINCT_COUNT_CUTOFF (RANK_POSITION_LOG_BUCKETS_BEG=
IN + ZSTD_highbit32(RANK_POSITION_LOG_BUCKETS_BEGIN) /* =3D=3D 166 */)
=20
 /* Return the appropriate bucket index for a given count. See definition of
  * RANK_POSITION_DISTINCT_COUNT_CUTOFF for explanation of bucketing strate=
gy.
@@ -457,7 +528,7 @@ typedef struct {
 static U32 HUF_getIndex(U32 const count) {
     return (count < RANK_POSITION_DISTINCT_COUNT_CUTOFF)
         ? count
-        : BIT_highbit32(count) + RANK_POSITION_LOG_BUCKETS_BEGIN;
+        : ZSTD_highbit32(count) + RANK_POSITION_LOG_BUCKETS_BEGIN;
 }
=20
 /* Helper swap function for HUF_quickSortPartition() */
@@ -580,7 +651,7 @@ static void HUF_sort(nodeElt huffNode[], const unsigned=
 count[], U32 const maxSy
=20
     /* Sort each bucket. */
     for (n =3D RANK_POSITION_DISTINCT_COUNT_CUTOFF; n < RANK_POSITION_TABL=
E_SIZE - 1; ++n) {
-        U32 const bucketSize =3D rankPosition[n].curr-rankPosition[n].base;
+        int const bucketSize =3D rankPosition[n].curr - rankPosition[n].ba=
se;
         U32 const bucketStartIdx =3D rankPosition[n].base;
         if (bucketSize > 1) {
             assert(bucketStartIdx < maxSymbolValue1);
@@ -591,6 +662,7 @@ static void HUF_sort(nodeElt huffNode[], const unsigned=
 count[], U32 const maxSy
     assert(HUF_isSorted(huffNode, maxSymbolValue1));
 }
=20
+
 /* HUF_buildCTable_wksp() :
  *  Same as HUF_buildCTable(), but using externally allocated scratch buff=
er.
  *  `workSpace` must be aligned on 4-bytes boundaries, and be at least as =
large as sizeof(HUF_buildCTable_wksp_tables).
@@ -611,6 +683,7 @@ static int HUF_buildTree(nodeElt* huffNode, U32 maxSymb=
olValue)
     int lowS, lowN;
     int nodeNb =3D STARTNODE;
     int n, nodeRoot;
+    DEBUGLOG(5, "HUF_buildTree (alphabet size =3D %u)", maxSymbolValue + 1=
);
     /* init for parents */
     nonNullRank =3D (int)maxSymbolValue;
     while(huffNode[nonNullRank].count =3D=3D 0) nonNullRank--;
@@ -637,6 +710,8 @@ static int HUF_buildTree(nodeElt* huffNode, U32 maxSymb=
olValue)
     for (n=3D0; n<=3DnonNullRank; n++)
         huffNode[n].nbBits =3D huffNode[ huffNode[n].parent ].nbBits + 1;
=20
+    DEBUGLOG(6, "Initial distribution of bits completed (%zu sorted symbol=
s)", showHNodeBits(huffNode, maxSymbolValue+1));
+
     return nonNullRank;
 }
=20
@@ -671,31 +746,40 @@ static void HUF_buildCTableFromTree(HUF_CElt* CTable,=
 nodeElt const* huffNode, i
         HUF_setNbBits(ct + huffNode[n].byte, huffNode[n].nbBits);   /* pus=
h nbBits per symbol, symbol order */
     for (n=3D0; n<alphabetSize; n++)
         HUF_setValue(ct + n, valPerRank[HUF_getNbBits(ct[n])]++);   /* ass=
ign value within rank, symbol order */
-    CTable[0] =3D maxNbBits;
+
+    HUF_writeCTableHeader(CTable, maxNbBits, maxSymbolValue);
 }
=20
-size_t HUF_buildCTable_wksp (HUF_CElt* CTable, const unsigned* count, U32 =
maxSymbolValue, U32 maxNbBits, void* workSpace, size_t wkspSize)
+size_t
+HUF_buildCTable_wksp(HUF_CElt* CTable, const unsigned* count, U32 maxSymbo=
lValue, U32 maxNbBits,
+                     void* workSpace, size_t wkspSize)
 {
-    HUF_buildCTable_wksp_tables* const wksp_tables =3D (HUF_buildCTable_wk=
sp_tables*)HUF_alignUpWorkspace(workSpace, &wkspSize, ZSTD_ALIGNOF(U32));
+    HUF_buildCTable_wksp_tables* const wksp_tables =3D
+        (HUF_buildCTable_wksp_tables*)HUF_alignUpWorkspace(workSpace, &wks=
pSize, ZSTD_ALIGNOF(U32));
     nodeElt* const huffNode0 =3D wksp_tables->huffNodeTbl;
     nodeElt* const huffNode =3D huffNode0+1;
     int nonNullRank;
=20
+    HUF_STATIC_ASSERT(HUF_CTABLE_WORKSPACE_SIZE =3D=3D sizeof(HUF_buildCTa=
ble_wksp_tables));
+
+    DEBUGLOG(5, "HUF_buildCTable_wksp (alphabet size =3D %u)", maxSymbolVa=
lue+1);
+
     /* safety checks */
     if (wkspSize < sizeof(HUF_buildCTable_wksp_tables))
-      return ERROR(workSpace_tooSmall);
+        return ERROR(workSpace_tooSmall);
     if (maxNbBits =3D=3D 0) maxNbBits =3D HUF_TABLELOG_DEFAULT;
     if (maxSymbolValue > HUF_SYMBOLVALUE_MAX)
-      return ERROR(maxSymbolValue_tooLarge);
+        return ERROR(maxSymbolValue_tooLarge);
     ZSTD_memset(huffNode0, 0, sizeof(huffNodeTable));
=20
     /* sort, decreasing order */
     HUF_sort(huffNode, count, maxSymbolValue, wksp_tables->rankPosition);
+    DEBUGLOG(6, "sorted symbols completed (%zu symbols)", showHNodeSymbols=
(huffNode, maxSymbolValue+1));
=20
     /* build tree */
     nonNullRank =3D HUF_buildTree(huffNode, maxSymbolValue);
=20
-    /* enforce maxTableLog */
+    /* determine and enforce maxTableLog */
     maxNbBits =3D HUF_setMaxHeight(huffNode, (U32)nonNullRank, maxNbBits);
     if (maxNbBits > HUF_TABLELOG_MAX) return ERROR(GENERIC);   /* check fi=
t into table */
=20
@@ -716,13 +800,20 @@ size_t HUF_estimateCompressedSize(const HUF_CElt* CTa=
ble, const unsigned* count,
 }
=20
 int HUF_validateCTable(const HUF_CElt* CTable, const unsigned* count, unsi=
gned maxSymbolValue) {
-  HUF_CElt const* ct =3D CTable + 1;
-  int bad =3D 0;
-  int s;
-  for (s =3D 0; s <=3D (int)maxSymbolValue; ++s) {
-    bad |=3D (count[s] !=3D 0) & (HUF_getNbBits(ct[s]) =3D=3D 0);
-  }
-  return !bad;
+    HUF_CTableHeader header =3D HUF_readCTableHeader(CTable);
+    HUF_CElt const* ct =3D CTable + 1;
+    int bad =3D 0;
+    int s;
+
+    assert(header.tableLog <=3D HUF_TABLELOG_ABSOLUTEMAX);
+
+    if (header.maxSymbolValue < maxSymbolValue)
+        return 0;
+
+    for (s =3D 0; s <=3D (int)maxSymbolValue; ++s) {
+        bad |=3D (count[s] !=3D 0) & (HUF_getNbBits(ct[s]) =3D=3D 0);
+    }
+    return !bad;
 }
=20
 size_t HUF_compressBound(size_t size) { return HUF_COMPRESSBOUND(size); }
@@ -804,7 +895,7 @@ FORCE_INLINE_TEMPLATE void HUF_addBits(HUF_CStream_t* b=
itC, HUF_CElt elt, int id
 #if DEBUGLEVEL >=3D 1
     {
         size_t const nbBits =3D HUF_getNbBits(elt);
-        size_t const dirtyBits =3D nbBits =3D=3D 0 ? 0 : BIT_highbit32((U3=
2)nbBits) + 1;
+        size_t const dirtyBits =3D nbBits =3D=3D 0 ? 0 : ZSTD_highbit32((U=
32)nbBits) + 1;
         (void)dirtyBits;
         /* Middle bits are 0. */
         assert(((elt >> dirtyBits) << (dirtyBits + nbBits)) =3D=3D 0);
@@ -884,7 +975,7 @@ static size_t HUF_closeCStream(HUF_CStream_t* bitC)
     {
         size_t const nbBits =3D bitC->bitPos[0] & 0xFF;
         if (bitC->ptr >=3D bitC->endPtr) return 0; /* overflow detected */
-        return (bitC->ptr - bitC->startPtr) + (nbBits > 0);
+        return (size_t)(bitC->ptr - bitC->startPtr) + (nbBits > 0);
     }
 }
=20
@@ -964,17 +1055,17 @@ HUF_compress1X_usingCTable_internal_body(void* dst, =
size_t dstSize,
                                    const void* src, size_t srcSize,
                                    const HUF_CElt* CTable)
 {
-    U32 const tableLog =3D (U32)CTable[0];
+    U32 const tableLog =3D HUF_readCTableHeader(CTable).tableLog;
     HUF_CElt const* ct =3D CTable + 1;
     const BYTE* ip =3D (const BYTE*) src;
     BYTE* const ostart =3D (BYTE*)dst;
     BYTE* const oend =3D ostart + dstSize;
-    BYTE* op =3D ostart;
     HUF_CStream_t bitC;
=20
     /* init */
     if (dstSize < 8) return 0;   /* not enough space to compress */
-    { size_t const initErr =3D HUF_initCStream(&bitC, op, (size_t)(oend-op=
));
+    { BYTE* op =3D ostart;
+      size_t const initErr =3D HUF_initCStream(&bitC, op, (size_t)(oend-op=
));
       if (HUF_isError(initErr)) return 0; }
=20
     if (dstSize < HUF_tightCompressBound(srcSize, (size_t)tableLog) || tab=
leLog > 11)
@@ -1045,9 +1136,9 @@ HUF_compress1X_usingCTable_internal_default(void* dst=
, size_t dstSize,
 static size_t
 HUF_compress1X_usingCTable_internal(void* dst, size_t dstSize,
                               const void* src, size_t srcSize,
-                              const HUF_CElt* CTable, const int bmi2)
+                              const HUF_CElt* CTable, const int flags)
 {
-    if (bmi2) {
+    if (flags & HUF_flags_bmi2) {
         return HUF_compress1X_usingCTable_internal_bmi2(dst, dstSize, src,=
 srcSize, CTable);
     }
     return HUF_compress1X_usingCTable_internal_default(dst, dstSize, src, =
srcSize, CTable);
@@ -1058,28 +1149,23 @@ HUF_compress1X_usingCTable_internal(void* dst, size=
_t dstSize,
 static size_t
 HUF_compress1X_usingCTable_internal(void* dst, size_t dstSize,
                               const void* src, size_t srcSize,
-                              const HUF_CElt* CTable, const int bmi2)
+                              const HUF_CElt* CTable, const int flags)
 {
-    (void)bmi2;
+    (void)flags;
     return HUF_compress1X_usingCTable_internal_body(dst, dstSize, src, src=
Size, CTable);
 }
=20
 #endif
=20
-size_t HUF_compress1X_usingCTable(void* dst, size_t dstSize, const void* s=
rc, size_t srcSize, const HUF_CElt* CTable)
+size_t HUF_compress1X_usingCTable(void* dst, size_t dstSize, const void* s=
rc, size_t srcSize, const HUF_CElt* CTable, int flags)
 {
-    return HUF_compress1X_usingCTable_bmi2(dst, dstSize, src, srcSize, CTa=
ble, /* bmi2 */ 0);
-}
-
-size_t HUF_compress1X_usingCTable_bmi2(void* dst, size_t dstSize, const vo=
id* src, size_t srcSize, const HUF_CElt* CTable, int bmi2)
-{
-    return HUF_compress1X_usingCTable_internal(dst, dstSize, src, srcSize,=
 CTable, bmi2);
+    return HUF_compress1X_usingCTable_internal(dst, dstSize, src, srcSize,=
 CTable, flags);
 }
=20
 static size_t
 HUF_compress4X_usingCTable_internal(void* dst, size_t dstSize,
                               const void* src, size_t srcSize,
-                              const HUF_CElt* CTable, int bmi2)
+                              const HUF_CElt* CTable, int flags)
 {
     size_t const segmentSize =3D (srcSize+3)/4;   /* first 3 segments */
     const BYTE* ip =3D (const BYTE*) src;
@@ -1093,7 +1179,7 @@ HUF_compress4X_usingCTable_internal(void* dst, size_t=
 dstSize,
     op +=3D 6;   /* jumpTable */
=20
     assert(op <=3D oend);
-    {   CHECK_V_F(cSize, HUF_compress1X_usingCTable_internal(op, (size_t)(=
oend-op), ip, segmentSize, CTable, bmi2) );
+    {   CHECK_V_F(cSize, HUF_compress1X_usingCTable_internal(op, (size_t)(=
oend-op), ip, segmentSize, CTable, flags) );
         if (cSize =3D=3D 0 || cSize > 65535) return 0;
         MEM_writeLE16(ostart, (U16)cSize);
         op +=3D cSize;
@@ -1101,7 +1187,7 @@ HUF_compress4X_usingCTable_internal(void* dst, size_t=
 dstSize,
=20
     ip +=3D segmentSize;
     assert(op <=3D oend);
-    {   CHECK_V_F(cSize, HUF_compress1X_usingCTable_internal(op, (size_t)(=
oend-op), ip, segmentSize, CTable, bmi2) );
+    {   CHECK_V_F(cSize, HUF_compress1X_usingCTable_internal(op, (size_t)(=
oend-op), ip, segmentSize, CTable, flags) );
         if (cSize =3D=3D 0 || cSize > 65535) return 0;
         MEM_writeLE16(ostart+2, (U16)cSize);
         op +=3D cSize;
@@ -1109,7 +1195,7 @@ HUF_compress4X_usingCTable_internal(void* dst, size_t=
 dstSize,
=20
     ip +=3D segmentSize;
     assert(op <=3D oend);
-    {   CHECK_V_F(cSize, HUF_compress1X_usingCTable_internal(op, (size_t)(=
oend-op), ip, segmentSize, CTable, bmi2) );
+    {   CHECK_V_F(cSize, HUF_compress1X_usingCTable_internal(op, (size_t)(=
oend-op), ip, segmentSize, CTable, flags) );
         if (cSize =3D=3D 0 || cSize > 65535) return 0;
         MEM_writeLE16(ostart+4, (U16)cSize);
         op +=3D cSize;
@@ -1118,7 +1204,7 @@ HUF_compress4X_usingCTable_internal(void* dst, size_t=
 dstSize,
     ip +=3D segmentSize;
     assert(op <=3D oend);
     assert(ip <=3D iend);
-    {   CHECK_V_F(cSize, HUF_compress1X_usingCTable_internal(op, (size_t)(=
oend-op), ip, (size_t)(iend-ip), CTable, bmi2) );
+    {   CHECK_V_F(cSize, HUF_compress1X_usingCTable_internal(op, (size_t)(=
oend-op), ip, (size_t)(iend-ip), CTable, flags) );
         if (cSize =3D=3D 0 || cSize > 65535) return 0;
         op +=3D cSize;
     }
@@ -1126,14 +1212,9 @@ HUF_compress4X_usingCTable_internal(void* dst, size_=
t dstSize,
     return (size_t)(op-ostart);
 }
=20
-size_t HUF_compress4X_usingCTable(void* dst, size_t dstSize, const void* s=
rc, size_t srcSize, const HUF_CElt* CTable)
-{
-    return HUF_compress4X_usingCTable_bmi2(dst, dstSize, src, srcSize, CTa=
ble, /* bmi2 */ 0);
-}
-
-size_t HUF_compress4X_usingCTable_bmi2(void* dst, size_t dstSize, const vo=
id* src, size_t srcSize, const HUF_CElt* CTable, int bmi2)
+size_t HUF_compress4X_usingCTable(void* dst, size_t dstSize, const void* s=
rc, size_t srcSize, const HUF_CElt* CTable, int flags)
 {
-    return HUF_compress4X_usingCTable_internal(dst, dstSize, src, srcSize,=
 CTable, bmi2);
+    return HUF_compress4X_usingCTable_internal(dst, dstSize, src, srcSize,=
 CTable, flags);
 }
=20
 typedef enum { HUF_singleStream, HUF_fourStreams } HUF_nbStreams_e;
@@ -1141,11 +1222,11 @@ typedef enum { HUF_singleStream, HUF_fourStreams } =
HUF_nbStreams_e;
 static size_t HUF_compressCTable_internal(
                 BYTE* const ostart, BYTE* op, BYTE* const oend,
                 const void* src, size_t srcSize,
-                HUF_nbStreams_e nbStreams, const HUF_CElt* CTable, const i=
nt bmi2)
+                HUF_nbStreams_e nbStreams, const HUF_CElt* CTable, const i=
nt flags)
 {
     size_t const cSize =3D (nbStreams=3D=3DHUF_singleStream) ?
-                         HUF_compress1X_usingCTable_internal(op, (size_t)(=
oend - op), src, srcSize, CTable, bmi2) :
-                         HUF_compress4X_usingCTable_internal(op, (size_t)(=
oend - op), src, srcSize, CTable, bmi2);
+                         HUF_compress1X_usingCTable_internal(op, (size_t)(=
oend - op), src, srcSize, CTable, flags) :
+                         HUF_compress4X_usingCTable_internal(op, (size_t)(=
oend - op), src, srcSize, CTable, flags);
     if (HUF_isError(cSize)) { return cSize; }
     if (cSize=3D=3D0) { return 0; }   /* uncompressible */
     op +=3D cSize;
@@ -1168,6 +1249,81 @@ typedef struct {
 #define SUSPECT_INCOMPRESSIBLE_SAMPLE_SIZE 4096
 #define SUSPECT_INCOMPRESSIBLE_SAMPLE_RATIO 10  /* Must be >=3D 2 */
=20
+unsigned HUF_cardinality(const unsigned* count, unsigned maxSymbolValue)
+{
+    unsigned cardinality =3D 0;
+    unsigned i;
+
+    for (i =3D 0; i < maxSymbolValue + 1; i++) {
+        if (count[i] !=3D 0) cardinality +=3D 1;
+    }
+
+    return cardinality;
+}
+
+unsigned HUF_minTableLog(unsigned symbolCardinality)
+{
+    U32 minBitsSymbols =3D ZSTD_highbit32(symbolCardinality) + 1;
+    return minBitsSymbols;
+}
+
+unsigned HUF_optimalTableLog(
+            unsigned maxTableLog,
+            size_t srcSize,
+            unsigned maxSymbolValue,
+            void* workSpace, size_t wkspSize,
+            HUF_CElt* table,
+      const unsigned* count,
+            int flags)
+{
+    assert(srcSize > 1); /* Not supported, RLE should be used instead */
+    assert(wkspSize >=3D sizeof(HUF_buildCTable_wksp_tables));
+
+    if (!(flags & HUF_flags_optimalDepth)) {
+        /* cheap evaluation, based on FSE */
+        return FSE_optimalTableLog_internal(maxTableLog, srcSize, maxSymbo=
lValue, 1);
+    }
+
+    {   BYTE* dst =3D (BYTE*)workSpace + sizeof(HUF_WriteCTableWksp);
+        size_t dstSize =3D wkspSize - sizeof(HUF_WriteCTableWksp);
+        size_t hSize, newSize;
+        const unsigned symbolCardinality =3D HUF_cardinality(count, maxSym=
bolValue);
+        const unsigned minTableLog =3D HUF_minTableLog(symbolCardinality);
+        size_t optSize =3D ((size_t) ~0) - 1;
+        unsigned optLog =3D maxTableLog, optLogGuess;
+
+        DEBUGLOG(6, "HUF_optimalTableLog: probing huf depth (srcSize=3D%zu=
)", srcSize);
+
+        /* Search until size increases */
+        for (optLogGuess =3D minTableLog; optLogGuess <=3D maxTableLog; op=
tLogGuess++) {
+            DEBUGLOG(7, "checking for huffLog=3D%u", optLogGuess);
+
+            {   size_t maxBits =3D HUF_buildCTable_wksp(table, count, maxS=
ymbolValue, optLogGuess, workSpace, wkspSize);
+                if (ERR_isError(maxBits)) continue;
+
+                if (maxBits < optLogGuess && optLogGuess > minTableLog) br=
eak;
+
+                hSize =3D HUF_writeCTable_wksp(dst, dstSize, table, maxSym=
bolValue, (U32)maxBits, workSpace, wkspSize);
+            }
+
+            if (ERR_isError(hSize)) continue;
+
+            newSize =3D HUF_estimateCompressedSize(table, count, maxSymbol=
Value) + hSize;
+
+            if (newSize > optSize + 1) {
+                break;
+            }
+
+            if (newSize < optSize) {
+                optSize =3D newSize;
+                optLog =3D optLogGuess;
+            }
+        }
+        assert(optLog <=3D HUF_TABLELOG_MAX);
+        return optLog;
+    }
+}
+
 /* HUF_compress_internal() :
  * `workSpace_align4` must be aligned on 4-bytes boundaries,
  * and occupies the same space as a table of HUF_WORKSPACE_SIZE_U64 unsign=
ed */
@@ -1177,14 +1333,14 @@ HUF_compress_internal (void* dst, size_t dstSize,
                        unsigned maxSymbolValue, unsigned huffLog,
                        HUF_nbStreams_e nbStreams,
                        void* workSpace, size_t wkspSize,
-                       HUF_CElt* oldHufTable, HUF_repeat* repeat, int pref=
erRepeat,
-                 const int bmi2, unsigned suspectUncompressible)
+                       HUF_CElt* oldHufTable, HUF_repeat* repeat, int flag=
s)
 {
     HUF_compress_tables_t* const table =3D (HUF_compress_tables_t*)HUF_ali=
gnUpWorkspace(workSpace, &wkspSize, ZSTD_ALIGNOF(size_t));
     BYTE* const ostart =3D (BYTE*)dst;
     BYTE* const oend =3D ostart + dstSize;
     BYTE* op =3D ostart;
=20
+    DEBUGLOG(5, "HUF_compress_internal (srcSize=3D%zu)", srcSize);
     HUF_STATIC_ASSERT(sizeof(*table) + HUF_WORKSPACE_MAX_ALIGNMENT <=3D HU=
F_WORKSPACE_SIZE);
=20
     /* checks & inits */
@@ -1198,16 +1354,17 @@ HUF_compress_internal (void* dst, size_t dstSize,
     if (!huffLog) huffLog =3D HUF_TABLELOG_DEFAULT;
=20
     /* Heuristic : If old table is valid, use it for small inputs */
-    if (preferRepeat && repeat && *repeat =3D=3D HUF_repeat_valid) {
+    if ((flags & HUF_flags_preferRepeat) && repeat && *repeat =3D=3D HUF_r=
epeat_valid) {
         return HUF_compressCTable_internal(ostart, op, oend,
                                            src, srcSize,
-                                           nbStreams, oldHufTable, bmi2);
+                                           nbStreams, oldHufTable, flags);
     }
=20
     /* If uncompressible data is suspected, do a smaller sampling first */
     DEBUG_STATIC_ASSERT(SUSPECT_INCOMPRESSIBLE_SAMPLE_RATIO >=3D 2);
-    if (suspectUncompressible && srcSize >=3D (SUSPECT_INCOMPRESSIBLE_SAMP=
LE_SIZE * SUSPECT_INCOMPRESSIBLE_SAMPLE_RATIO)) {
+    if ((flags & HUF_flags_suspectUncompressible) && srcSize >=3D (SUSPECT=
_INCOMPRESSIBLE_SAMPLE_SIZE * SUSPECT_INCOMPRESSIBLE_SAMPLE_RATIO)) {
         size_t largestTotal =3D 0;
+        DEBUGLOG(5, "input suspected incompressible : sampling to check");
         {   unsigned maxSymbolValueBegin =3D maxSymbolValue;
             CHECK_V_F(largestBegin, HIST_count_simple (table->count, &maxS=
ymbolValueBegin, (const BYTE*)src, SUSPECT_INCOMPRESSIBLE_SAMPLE_SIZE) );
             largestTotal +=3D largestBegin;
@@ -1224,6 +1381,7 @@ HUF_compress_internal (void* dst, size_t dstSize,
         if (largest =3D=3D srcSize) { *ostart =3D ((const BYTE*)src)[0]; r=
eturn 1; }   /* single symbol, rle */
         if (largest <=3D (srcSize >> 7)+4) return 0;   /* heuristic : prob=
ably not compressible enough */
     }
+    DEBUGLOG(6, "histogram detail completed (%zu symbols)", showU32(table-=
>count, maxSymbolValue+1));
=20
     /* Check validity of previous table */
     if ( repeat
@@ -1232,25 +1390,20 @@ HUF_compress_internal (void* dst, size_t dstSize,
         *repeat =3D HUF_repeat_none;
     }
     /* Heuristic : use existing table for small inputs */
-    if (preferRepeat && repeat && *repeat !=3D HUF_repeat_none) {
+    if ((flags & HUF_flags_preferRepeat) && repeat && *repeat !=3D HUF_rep=
eat_none) {
         return HUF_compressCTable_internal(ostart, op, oend,
                                            src, srcSize,
-                                           nbStreams, oldHufTable, bmi2);
+                                           nbStreams, oldHufTable, flags);
     }
=20
     /* Build Huffman Tree */
-    huffLog =3D HUF_optimalTableLog(huffLog, srcSize, maxSymbolValue);
+    huffLog =3D HUF_optimalTableLog(huffLog, srcSize, maxSymbolValue, &tab=
le->wksps, sizeof(table->wksps), table->CTable, table->count, flags);
     {   size_t const maxBits =3D HUF_buildCTable_wksp(table->CTable, table=
->count,
                                             maxSymbolValue, huffLog,
                                             &table->wksps.buildCTable_wksp=
, sizeof(table->wksps.buildCTable_wksp));
         CHECK_F(maxBits);
         huffLog =3D (U32)maxBits;
-    }
-    /* Zero unused symbols in CTable, so we can check it for validity */
-    {
-        size_t const ctableSize =3D HUF_CTABLE_SIZE_ST(maxSymbolValue);
-        size_t const unusedSize =3D sizeof(table->CTable) - ctableSize * s=
izeof(HUF_CElt);
-        ZSTD_memset(table->CTable + ctableSize, 0, unusedSize);
+        DEBUGLOG(6, "bit distribution completed (%zu symbols)", showCTable=
Bits(table->CTable + 1, maxSymbolValue+1));
     }
=20
     /* Write table description header */
@@ -1263,7 +1416,7 @@ HUF_compress_internal (void* dst, size_t dstSize,
             if (oldSize <=3D hSize + newSize || hSize + 12 >=3D srcSize) {
                 return HUF_compressCTable_internal(ostart, op, oend,
                                                    src, srcSize,
-                                                   nbStreams, oldHufTable,=
 bmi2);
+                                                   nbStreams, oldHufTable,=
 flags);
         }   }
=20
         /* Use the new huffman table */
@@ -1275,61 +1428,35 @@ HUF_compress_internal (void* dst, size_t dstSize,
     }
     return HUF_compressCTable_internal(ostart, op, oend,
                                        src, srcSize,
-                                       nbStreams, table->CTable, bmi2);
-}
-
-
-size_t HUF_compress1X_wksp (void* dst, size_t dstSize,
-                      const void* src, size_t srcSize,
-                      unsigned maxSymbolValue, unsigned huffLog,
-                      void* workSpace, size_t wkspSize)
-{
-    return HUF_compress_internal(dst, dstSize, src, srcSize,
-                                 maxSymbolValue, huffLog, HUF_singleStream,
-                                 workSpace, wkspSize,
-                                 NULL, NULL, 0, 0 /*bmi2*/, 0);
+                                       nbStreams, table->CTable, flags);
 }
=20
 size_t HUF_compress1X_repeat (void* dst, size_t dstSize,
                       const void* src, size_t srcSize,
                       unsigned maxSymbolValue, unsigned huffLog,
                       void* workSpace, size_t wkspSize,
-                      HUF_CElt* hufTable, HUF_repeat* repeat, int preferRe=
peat,
-                      int bmi2, unsigned suspectUncompressible)
+                      HUF_CElt* hufTable, HUF_repeat* repeat, int flags)
 {
+    DEBUGLOG(5, "HUF_compress1X_repeat (srcSize =3D %zu)", srcSize);
     return HUF_compress_internal(dst, dstSize, src, srcSize,
                                  maxSymbolValue, huffLog, HUF_singleStream,
                                  workSpace, wkspSize, hufTable,
-                                 repeat, preferRepeat, bmi2, suspectUncomp=
ressible);
-}
-
-/* HUF_compress4X_repeat():
- * compress input using 4 streams.
- * provide workspace to generate compression tables */
-size_t HUF_compress4X_wksp (void* dst, size_t dstSize,
-                      const void* src, size_t srcSize,
-                      unsigned maxSymbolValue, unsigned huffLog,
-                      void* workSpace, size_t wkspSize)
-{
-    return HUF_compress_internal(dst, dstSize, src, srcSize,
-                                 maxSymbolValue, huffLog, HUF_fourStreams,
-                                 workSpace, wkspSize,
-                                 NULL, NULL, 0, 0 /*bmi2*/, 0);
+                                 repeat, flags);
 }
=20
 /* HUF_compress4X_repeat():
  * compress input using 4 streams.
  * consider skipping quickly
- * re-use an existing huffman compression table */
+ * reuse an existing huffman compression table */
 size_t HUF_compress4X_repeat (void* dst, size_t dstSize,
                       const void* src, size_t srcSize,
                       unsigned maxSymbolValue, unsigned huffLog,
                       void* workSpace, size_t wkspSize,
-                      HUF_CElt* hufTable, HUF_repeat* repeat, int preferRe=
peat, int bmi2, unsigned suspectUncompressible)
+                      HUF_CElt* hufTable, HUF_repeat* repeat, int flags)
 {
+    DEBUGLOG(5, "HUF_compress4X_repeat (srcSize =3D %zu)", srcSize);
     return HUF_compress_internal(dst, dstSize, src, srcSize,
                                  maxSymbolValue, huffLog, HUF_fourStreams,
                                  workSpace, wkspSize,
-                                 hufTable, repeat, preferRepeat, bmi2, sus=
pectUncompressible);
+                                 hufTable, repeat, flags);
 }
-
diff --git a/lib/zstd/compress/zstd_compress.c b/lib/zstd/compress/zstd_com=
press.c
index 16bb995bc6c4..c41a747413e0 100644
--- a/lib/zstd/compress/zstd_compress.c
+++ b/lib/zstd/compress/zstd_compress.c
@@ -1,5 +1,6 @@
+// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -11,12 +12,13 @@
 /*-*************************************
 *  Dependencies
 ***************************************/
+#include "../common/allocations.h"  /* ZSTD_customMalloc, ZSTD_customCallo=
c, ZSTD_customFree */
 #include "../common/zstd_deps.h"  /* INT_MAX, ZSTD_memset, ZSTD_memcpy */
 #include "../common/mem.h"
+#include "../common/error_private.h"
 #include "hist.h"           /* HIST_countFast_wksp */
 #define FSE_STATIC_LINKING_ONLY   /* FSE_encodeSymbol */
 #include "../common/fse.h"
-#define HUF_STATIC_LINKING_ONLY
 #include "../common/huf.h"
 #include "zstd_compress_internal.h"
 #include "zstd_compress_sequences.h"
@@ -27,6 +29,7 @@
 #include "zstd_opt.h"
 #include "zstd_ldm.h"
 #include "zstd_compress_superblock.h"
+#include  "../common/bits.h"      /* ZSTD_highbit32, ZSTD_rotateRight_U64 =
*/
=20
 /* ***************************************************************
 *  Tuning parameters
@@ -44,7 +47,7 @@
  * in log format, aka 17 =3D> 1 << 17 =3D=3D 128Ki positions.
  * This structure is only used in zstd_opt.
  * Since allocation is centralized for all strategies, it has to be known =
here.
- * The actual (selected) size of the hash table is then stored in ZSTD_mat=
chState_t.hashLog3,
+ * The actual (selected) size of the hash table is then stored in ZSTD_Mat=
chState_t.hashLog3,
  * so that zstd_opt.c doesn't need to know about this constant.
  */
 #ifndef ZSTD_HASHLOG3_MAX
@@ -55,14 +58,17 @@
 *  Helper functions
 ***************************************/
 /* ZSTD_compressBound()
- * Note that the result from this function is only compatible with the "no=
rmal"
- * full-block strategy.
- * When there are a lot of small blocks due to frequent flush in streaming=
 mode
- * the overhead of headers can make the compressed data to be larger than =
the
- * return value of ZSTD_compressBound().
+ * Note that the result from this function is only valid for
+ * the one-pass compression functions.
+ * When employing the streaming mode,
+ * if flushes are frequently altering the size of blocks,
+ * the overhead from block headers can make the compressed data larger
+ * than the return value of ZSTD_compressBound().
  */
 size_t ZSTD_compressBound(size_t srcSize) {
-    return ZSTD_COMPRESSBOUND(srcSize);
+    size_t const r =3D ZSTD_COMPRESSBOUND(srcSize);
+    if (r=3D=3D0) return ERROR(srcSize_wrong);
+    return r;
 }
=20
=20
@@ -75,12 +81,12 @@ struct ZSTD_CDict_s {
     ZSTD_dictContentType_e dictContentType; /* The dictContentType the CDi=
ct was created with */
     U32* entropyWorkspace; /* entropy workspace of HUF_WORKSPACE_SIZE byte=
s */
     ZSTD_cwksp workspace;
-    ZSTD_matchState_t matchState;
+    ZSTD_MatchState_t matchState;
     ZSTD_compressedBlockState_t cBlockState;
     ZSTD_customMem customMem;
     U32 dictID;
     int compressionLevel; /* 0 indicates that advanced API was used to sel=
ect CDict params */
-    ZSTD_paramSwitch_e useRowMatchFinder; /* Indicates whether the CDict w=
as created with params that would use
+    ZSTD_ParamSwitch_e useRowMatchFinder; /* Indicates whether the CDict w=
as created with params that would use
                                            * row-based matchfinder. Unless=
 the cdict is reloaded, we will use
                                            * the same greedy/lazy matchfin=
der at compression time.
                                            */
@@ -130,11 +136,12 @@ ZSTD_CCtx* ZSTD_initStaticCCtx(void* workspace, size_=
t workspaceSize)
     ZSTD_cwksp_move(&cctx->workspace, &ws);
     cctx->staticSize =3D workspaceSize;
=20
-    /* statically sized space. entropyWorkspace never moves (but prev/next=
 block swap places) */
-    if (!ZSTD_cwksp_check_available(&cctx->workspace, ENTROPY_WORKSPACE_SI=
ZE + 2 * sizeof(ZSTD_compressedBlockState_t))) return NULL;
+    /* statically sized space. tmpWorkspace never moves (but prev/next blo=
ck swap places) */
+    if (!ZSTD_cwksp_check_available(&cctx->workspace, TMP_WORKSPACE_SIZE +=
 2 * sizeof(ZSTD_compressedBlockState_t))) return NULL;
     cctx->blockState.prevCBlock =3D (ZSTD_compressedBlockState_t*)ZSTD_cwk=
sp_reserve_object(&cctx->workspace, sizeof(ZSTD_compressedBlockState_t));
     cctx->blockState.nextCBlock =3D (ZSTD_compressedBlockState_t*)ZSTD_cwk=
sp_reserve_object(&cctx->workspace, sizeof(ZSTD_compressedBlockState_t));
-    cctx->entropyWorkspace =3D (U32*)ZSTD_cwksp_reserve_object(&cctx->work=
space, ENTROPY_WORKSPACE_SIZE);
+    cctx->tmpWorkspace =3D ZSTD_cwksp_reserve_object(&cctx->workspace, TMP=
_WORKSPACE_SIZE);
+    cctx->tmpWkspSize =3D TMP_WORKSPACE_SIZE;
     cctx->bmi2 =3D ZSTD_cpuid_bmi2(ZSTD_cpuid());
     return cctx;
 }
@@ -168,15 +175,13 @@ static void ZSTD_freeCCtxContent(ZSTD_CCtx* cctx)
=20
 size_t ZSTD_freeCCtx(ZSTD_CCtx* cctx)
 {
+    DEBUGLOG(3, "ZSTD_freeCCtx (address: %p)", (void*)cctx);
     if (cctx=3D=3DNULL) return 0;   /* support free on NULL */
     RETURN_ERROR_IF(cctx->staticSize, memory_allocation,
                     "not compatible with static CCtx");
-    {
-        int cctxInWorkspace =3D ZSTD_cwksp_owns_buffer(&cctx->workspace, c=
ctx);
+    {   int cctxInWorkspace =3D ZSTD_cwksp_owns_buffer(&cctx->workspace, c=
ctx);
         ZSTD_freeCCtxContent(cctx);
-        if (!cctxInWorkspace) {
-            ZSTD_customFree(cctx, cctx->customMem);
-        }
+        if (!cctxInWorkspace) ZSTD_customFree(cctx, cctx->customMem);
     }
     return 0;
 }
@@ -205,7 +210,7 @@ size_t ZSTD_sizeof_CStream(const ZSTD_CStream* zcs)
 }
=20
 /* private API call, for dictBuilder only */
-const seqStore_t* ZSTD_getSeqStore(const ZSTD_CCtx* ctx) { return &(ctx->s=
eqStore); }
+const SeqStore_t* ZSTD_getSeqStore(const ZSTD_CCtx* ctx) { return &(ctx->s=
eqStore); }
=20
 /* Returns true if the strategy supports using a row based matchfinder */
 static int ZSTD_rowMatchFinderSupported(const ZSTD_strategy strategy) {
@@ -215,32 +220,27 @@ static int ZSTD_rowMatchFinderSupported(const ZSTD_st=
rategy strategy) {
 /* Returns true if the strategy and useRowMatchFinder mode indicate that w=
e will use the row based matchfinder
  * for this compression.
  */
-static int ZSTD_rowMatchFinderUsed(const ZSTD_strategy strategy, const ZST=
D_paramSwitch_e mode) {
+static int ZSTD_rowMatchFinderUsed(const ZSTD_strategy strategy, const ZST=
D_ParamSwitch_e mode) {
     assert(mode !=3D ZSTD_ps_auto);
     return ZSTD_rowMatchFinderSupported(strategy) && (mode =3D=3D ZSTD_ps_=
enable);
 }
=20
 /* Returns row matchfinder usage given an initial mode and cParams */
-static ZSTD_paramSwitch_e ZSTD_resolveRowMatchFinderMode(ZSTD_paramSwitch_=
e mode,
+static ZSTD_ParamSwitch_e ZSTD_resolveRowMatchFinderMode(ZSTD_ParamSwitch_=
e mode,
                                                          const ZSTD_compre=
ssionParameters* const cParams) {
-#if defined(ZSTD_ARCH_X86_SSE2) || defined(ZSTD_ARCH_ARM_NEON)
-    int const kHasSIMD128 =3D 1;
-#else
-    int const kHasSIMD128 =3D 0;
-#endif
+    /* The Linux Kernel does not use SIMD, and 128KB is a very common size=
, e.g. in BtrFS.
+     * The row match finder is slower for this size without SIMD, so disab=
le it.
+     */
+    const unsigned kWindowLogLowerBound =3D 17;
     if (mode !=3D ZSTD_ps_auto) return mode; /* if requested enabled, but =
no SIMD, we still will use row matchfinder */
     mode =3D ZSTD_ps_disable;
     if (!ZSTD_rowMatchFinderSupported(cParams->strategy)) return mode;
-    if (kHasSIMD128) {
-        if (cParams->windowLog > 14) mode =3D ZSTD_ps_enable;
-    } else {
-        if (cParams->windowLog > 17) mode =3D ZSTD_ps_enable;
-    }
+    if (cParams->windowLog > kWindowLogLowerBound) mode =3D ZSTD_ps_enable;
     return mode;
 }
=20
 /* Returns block splitter usage (generally speaking, when using slower/str=
onger compression modes) */
-static ZSTD_paramSwitch_e ZSTD_resolveBlockSplitterMode(ZSTD_paramSwitch_e=
 mode,
+static ZSTD_ParamSwitch_e ZSTD_resolveBlockSplitterMode(ZSTD_ParamSwitch_e=
 mode,
                                                         const ZSTD_compres=
sionParameters* const cParams) {
     if (mode !=3D ZSTD_ps_auto) return mode;
     return (cParams->strategy >=3D ZSTD_btopt && cParams->windowLog >=3D 1=
7) ? ZSTD_ps_enable : ZSTD_ps_disable;
@@ -248,7 +248,7 @@ static ZSTD_paramSwitch_e ZSTD_resolveBlockSplitterMode=
(ZSTD_paramSwitch_e mode,
=20
 /* Returns 1 if the arguments indicate that we should allocate a chainTabl=
e, 0 otherwise */
 static int ZSTD_allocateChainTable(const ZSTD_strategy strategy,
-                                   const ZSTD_paramSwitch_e useRowMatchFin=
der,
+                                   const ZSTD_ParamSwitch_e useRowMatchFin=
der,
                                    const U32 forDDSDict) {
     assert(useRowMatchFinder !=3D ZSTD_ps_auto);
     /* We always should allocate a chaintable if we are allocating a match=
state for a DDS dictionary matchstate.
@@ -257,16 +257,44 @@ static int ZSTD_allocateChainTable(const ZSTD_strateg=
y strategy,
     return forDDSDict || ((strategy !=3D ZSTD_fast) && !ZSTD_rowMatchFinde=
rUsed(strategy, useRowMatchFinder));
 }
=20
-/* Returns 1 if compression parameters are such that we should
+/* Returns ZSTD_ps_enable if compression parameters are such that we should
  * enable long distance matching (wlog >=3D 27, strategy >=3D btopt).
- * Returns 0 otherwise.
+ * Returns ZSTD_ps_disable otherwise.
  */
-static ZSTD_paramSwitch_e ZSTD_resolveEnableLdm(ZSTD_paramSwitch_e mode,
+static ZSTD_ParamSwitch_e ZSTD_resolveEnableLdm(ZSTD_ParamSwitch_e mode,
                                  const ZSTD_compressionParameters* const c=
Params) {
     if (mode !=3D ZSTD_ps_auto) return mode;
     return (cParams->strategy >=3D ZSTD_btopt && cParams->windowLog >=3D 2=
7) ? ZSTD_ps_enable : ZSTD_ps_disable;
 }
=20
+static int ZSTD_resolveExternalSequenceValidation(int mode) {
+    return mode;
+}
+
+/* Resolves maxBlockSize to the default if no value is present. */
+static size_t ZSTD_resolveMaxBlockSize(size_t maxBlockSize) {
+    if (maxBlockSize =3D=3D 0) {
+        return ZSTD_BLOCKSIZE_MAX;
+    } else {
+        return maxBlockSize;
+    }
+}
+
+static ZSTD_ParamSwitch_e ZSTD_resolveExternalRepcodeSearch(ZSTD_ParamSwit=
ch_e value, int cLevel) {
+    if (value !=3D ZSTD_ps_auto) return value;
+    if (cLevel < 10) {
+        return ZSTD_ps_disable;
+    } else {
+        return ZSTD_ps_enable;
+    }
+}
+
+/* Returns 1 if compression parameters are such that CDict hashtable and c=
haintable indices are tagged.
+ * If so, the tags need to be removed in ZSTD_resetCCtx_byCopyingCDict. */
+static int ZSTD_CDictIndicesAreTagged(const ZSTD_compressionParameters* co=
nst cParams) {
+    return cParams->strategy =3D=3D ZSTD_fast || cParams->strategy =3D=3D =
ZSTD_dfast;
+}
+
 static ZSTD_CCtx_params ZSTD_makeCCtxParamsFromCParams(
         ZSTD_compressionParameters cParams)
 {
@@ -282,8 +310,12 @@ static ZSTD_CCtx_params ZSTD_makeCCtxParamsFromCParams(
         assert(cctxParams.ldmParams.hashLog >=3D cctxParams.ldmParams.buck=
etSizeLog);
         assert(cctxParams.ldmParams.hashRateLog < 32);
     }
-    cctxParams.useBlockSplitter =3D ZSTD_resolveBlockSplitterMode(cctxPara=
ms.useBlockSplitter, &cParams);
+    cctxParams.postBlockSplitter =3D ZSTD_resolveBlockSplitterMode(cctxPar=
ams.postBlockSplitter, &cParams);
     cctxParams.useRowMatchFinder =3D ZSTD_resolveRowMatchFinderMode(cctxPa=
rams.useRowMatchFinder, &cParams);
+    cctxParams.validateSequences =3D ZSTD_resolveExternalSequenceValidatio=
n(cctxParams.validateSequences);
+    cctxParams.maxBlockSize =3D ZSTD_resolveMaxBlockSize(cctxParams.maxBlo=
ckSize);
+    cctxParams.searchForExternalRepcodes =3D ZSTD_resolveExternalRepcodeSe=
arch(cctxParams.searchForExternalRepcodes,
+                                                                          =
   cctxParams.compressionLevel);
     assert(!ZSTD_checkCParams(cParams));
     return cctxParams;
 }
@@ -329,10 +361,13 @@ size_t ZSTD_CCtxParams_init(ZSTD_CCtx_params* cctxPar=
ams, int compressionLevel)
 #define ZSTD_NO_CLEVEL 0
=20
 /*
- * Initializes the cctxParams from params and compressionLevel.
+ * Initializes `cctxParams` from `params` and `compressionLevel`.
  * @param compressionLevel If params are derived from a compression level =
then that compression level, otherwise ZSTD_NO_CLEVEL.
  */
-static void ZSTD_CCtxParams_init_internal(ZSTD_CCtx_params* cctxParams, ZS=
TD_parameters const* params, int compressionLevel)
+static void
+ZSTD_CCtxParams_init_internal(ZSTD_CCtx_params* cctxParams,
+                        const ZSTD_parameters* params,
+                              int compressionLevel)
 {
     assert(!ZSTD_checkCParams(params->cParams));
     ZSTD_memset(cctxParams, 0, sizeof(*cctxParams));
@@ -343,10 +378,13 @@ static void ZSTD_CCtxParams_init_internal(ZSTD_CCtx_p=
arams* cctxParams, ZSTD_par
      */
     cctxParams->compressionLevel =3D compressionLevel;
     cctxParams->useRowMatchFinder =3D ZSTD_resolveRowMatchFinderMode(cctxP=
arams->useRowMatchFinder, &params->cParams);
-    cctxParams->useBlockSplitter =3D ZSTD_resolveBlockSplitterMode(cctxPar=
ams->useBlockSplitter, &params->cParams);
+    cctxParams->postBlockSplitter =3D ZSTD_resolveBlockSplitterMode(cctxPa=
rams->postBlockSplitter, &params->cParams);
     cctxParams->ldmParams.enableLdm =3D ZSTD_resolveEnableLdm(cctxParams->=
ldmParams.enableLdm, &params->cParams);
+    cctxParams->validateSequences =3D ZSTD_resolveExternalSequenceValidati=
on(cctxParams->validateSequences);
+    cctxParams->maxBlockSize =3D ZSTD_resolveMaxBlockSize(cctxParams->maxB=
lockSize);
+    cctxParams->searchForExternalRepcodes =3D ZSTD_resolveExternalRepcodeS=
earch(cctxParams->searchForExternalRepcodes, compressionLevel);
     DEBUGLOG(4, "ZSTD_CCtxParams_init_internal: useRowMatchFinder=3D%d, us=
eBlockSplitter=3D%d ldm=3D%d",
-                cctxParams->useRowMatchFinder, cctxParams->useBlockSplitte=
r, cctxParams->ldmParams.enableLdm);
+                cctxParams->useRowMatchFinder, cctxParams->postBlockSplitt=
er, cctxParams->ldmParams.enableLdm);
 }
=20
 size_t ZSTD_CCtxParams_init_advanced(ZSTD_CCtx_params* cctxParams, ZSTD_pa=
rameters params)
@@ -359,7 +397,7 @@ size_t ZSTD_CCtxParams_init_advanced(ZSTD_CCtx_params* =
cctxParams, ZSTD_paramete
=20
 /*
  * Sets cctxParams' cParams and fParams from params, but otherwise leaves =
them alone.
- * @param param Validated zstd parameters.
+ * @param params Validated zstd parameters.
  */
 static void ZSTD_CCtxParams_setZstdParams(
         ZSTD_CCtx_params* cctxParams, const ZSTD_parameters* params)
@@ -455,8 +493,8 @@ ZSTD_bounds ZSTD_cParam_getBounds(ZSTD_cParameter param)
         return bounds;
=20
     case ZSTD_c_enableLongDistanceMatching:
-        bounds.lowerBound =3D 0;
-        bounds.upperBound =3D 1;
+        bounds.lowerBound =3D (int)ZSTD_ps_auto;
+        bounds.upperBound =3D (int)ZSTD_ps_disable;
         return bounds;
=20
     case ZSTD_c_ldmHashLog:
@@ -534,11 +572,16 @@ ZSTD_bounds ZSTD_cParam_getBounds(ZSTD_cParameter par=
am)
         bounds.upperBound =3D 1;
         return bounds;
=20
-    case ZSTD_c_useBlockSplitter:
+    case ZSTD_c_splitAfterSequences:
         bounds.lowerBound =3D (int)ZSTD_ps_auto;
         bounds.upperBound =3D (int)ZSTD_ps_disable;
         return bounds;
=20
+    case ZSTD_c_blockSplitterLevel:
+        bounds.lowerBound =3D 0;
+        bounds.upperBound =3D ZSTD_BLOCKSPLITTER_LEVEL_MAX;
+        return bounds;
+
     case ZSTD_c_useRowMatchFinder:
         bounds.lowerBound =3D (int)ZSTD_ps_auto;
         bounds.upperBound =3D (int)ZSTD_ps_disable;
@@ -549,6 +592,26 @@ ZSTD_bounds ZSTD_cParam_getBounds(ZSTD_cParameter para=
m)
         bounds.upperBound =3D 1;
         return bounds;
=20
+    case ZSTD_c_prefetchCDictTables:
+        bounds.lowerBound =3D (int)ZSTD_ps_auto;
+        bounds.upperBound =3D (int)ZSTD_ps_disable;
+        return bounds;
+
+    case ZSTD_c_enableSeqProducerFallback:
+        bounds.lowerBound =3D 0;
+        bounds.upperBound =3D 1;
+        return bounds;
+
+    case ZSTD_c_maxBlockSize:
+        bounds.lowerBound =3D ZSTD_BLOCKSIZE_MAX_MIN;
+        bounds.upperBound =3D ZSTD_BLOCKSIZE_MAX;
+        return bounds;
+
+    case ZSTD_c_repcodeResolution:
+        bounds.lowerBound =3D (int)ZSTD_ps_auto;
+        bounds.upperBound =3D (int)ZSTD_ps_disable;
+        return bounds;
+
     default:
         bounds.error =3D ERROR(parameter_unsupported);
         return bounds;
@@ -567,10 +630,11 @@ static size_t ZSTD_cParam_clampBounds(ZSTD_cParameter=
 cParam, int* value)
     return 0;
 }
=20
-#define BOUNDCHECK(cParam, val) { \
-    RETURN_ERROR_IF(!ZSTD_cParam_withinBounds(cParam,val), \
-                    parameter_outOfBound, "Param out of bounds"); \
-}
+#define BOUNDCHECK(cParam, val)                                       \
+    do {                                                              \
+        RETURN_ERROR_IF(!ZSTD_cParam_withinBounds(cParam,val),        \
+                        parameter_outOfBound, "Param out of bounds"); \
+    } while (0)
=20
=20
 static int ZSTD_isUpdateAuthorized(ZSTD_cParameter param)
@@ -584,6 +648,7 @@ static int ZSTD_isUpdateAuthorized(ZSTD_cParameter para=
m)
     case ZSTD_c_minMatch:
     case ZSTD_c_targetLength:
     case ZSTD_c_strategy:
+    case ZSTD_c_blockSplitterLevel:
         return 1;
=20
     case ZSTD_c_format:
@@ -610,9 +675,13 @@ static int ZSTD_isUpdateAuthorized(ZSTD_cParameter par=
am)
     case ZSTD_c_stableOutBuffer:
     case ZSTD_c_blockDelimiters:
     case ZSTD_c_validateSequences:
-    case ZSTD_c_useBlockSplitter:
+    case ZSTD_c_splitAfterSequences:
     case ZSTD_c_useRowMatchFinder:
     case ZSTD_c_deterministicRefPrefix:
+    case ZSTD_c_prefetchCDictTables:
+    case ZSTD_c_enableSeqProducerFallback:
+    case ZSTD_c_maxBlockSize:
+    case ZSTD_c_repcodeResolution:
     default:
         return 0;
     }
@@ -625,7 +694,7 @@ size_t ZSTD_CCtx_setParameter(ZSTD_CCtx* cctx, ZSTD_cPa=
rameter param, int value)
         if (ZSTD_isUpdateAuthorized(param)) {
             cctx->cParamsChanged =3D 1;
         } else {
-            RETURN_ERROR(stage_wrong, "can only set params in ctx init sta=
ge");
+            RETURN_ERROR(stage_wrong, "can only set params in cctx init st=
age");
     }   }
=20
     switch(param)
@@ -665,9 +734,14 @@ size_t ZSTD_CCtx_setParameter(ZSTD_CCtx* cctx, ZSTD_cP=
arameter param, int value)
     case ZSTD_c_stableOutBuffer:
     case ZSTD_c_blockDelimiters:
     case ZSTD_c_validateSequences:
-    case ZSTD_c_useBlockSplitter:
+    case ZSTD_c_splitAfterSequences:
+    case ZSTD_c_blockSplitterLevel:
     case ZSTD_c_useRowMatchFinder:
     case ZSTD_c_deterministicRefPrefix:
+    case ZSTD_c_prefetchCDictTables:
+    case ZSTD_c_enableSeqProducerFallback:
+    case ZSTD_c_maxBlockSize:
+    case ZSTD_c_repcodeResolution:
         break;
=20
     default: RETURN_ERROR(parameter_unsupported, "unknown parameter");
@@ -723,12 +797,12 @@ size_t ZSTD_CCtxParams_setParameter(ZSTD_CCtx_params*=
 CCtxParams,
     case ZSTD_c_minMatch :
         if (value!=3D0)   /* 0 =3D> use default */
             BOUNDCHECK(ZSTD_c_minMatch, value);
-        CCtxParams->cParams.minMatch =3D value;
+        CCtxParams->cParams.minMatch =3D (U32)value;
         return CCtxParams->cParams.minMatch;
=20
     case ZSTD_c_targetLength :
         BOUNDCHECK(ZSTD_c_targetLength, value);
-        CCtxParams->cParams.targetLength =3D value;
+        CCtxParams->cParams.targetLength =3D (U32)value;
         return CCtxParams->cParams.targetLength;
=20
     case ZSTD_c_strategy :
@@ -741,12 +815,12 @@ size_t ZSTD_CCtxParams_setParameter(ZSTD_CCtx_params*=
 CCtxParams,
         /* Content size written in frame header _when known_ (default:1) */
         DEBUGLOG(4, "set content size flag =3D %u", (value!=3D0));
         CCtxParams->fParams.contentSizeFlag =3D value !=3D 0;
-        return CCtxParams->fParams.contentSizeFlag;
+        return (size_t)CCtxParams->fParams.contentSizeFlag;
=20
     case ZSTD_c_checksumFlag :
         /* A 32-bits content checksum will be calculated and written at en=
d of frame (default:0) */
         CCtxParams->fParams.checksumFlag =3D value !=3D 0;
-        return CCtxParams->fParams.checksumFlag;
+        return (size_t)CCtxParams->fParams.checksumFlag;
=20
     case ZSTD_c_dictIDFlag : /* When applicable, dictionary's dictID is pr=
ovided in frame header (default:1) */
         DEBUGLOG(4, "set dictIDFlag =3D %u", (value!=3D0));
@@ -755,18 +829,18 @@ size_t ZSTD_CCtxParams_setParameter(ZSTD_CCtx_params*=
 CCtxParams,
=20
     case ZSTD_c_forceMaxWindow :
         CCtxParams->forceWindow =3D (value !=3D 0);
-        return CCtxParams->forceWindow;
+        return (size_t)CCtxParams->forceWindow;
=20
     case ZSTD_c_forceAttachDict : {
         const ZSTD_dictAttachPref_e pref =3D (ZSTD_dictAttachPref_e)value;
-        BOUNDCHECK(ZSTD_c_forceAttachDict, pref);
+        BOUNDCHECK(ZSTD_c_forceAttachDict, (int)pref);
         CCtxParams->attachDictPref =3D pref;
         return CCtxParams->attachDictPref;
     }
=20
     case ZSTD_c_literalCompressionMode : {
-        const ZSTD_paramSwitch_e lcm =3D (ZSTD_paramSwitch_e)value;
-        BOUNDCHECK(ZSTD_c_literalCompressionMode, lcm);
+        const ZSTD_ParamSwitch_e lcm =3D (ZSTD_ParamSwitch_e)value;
+        BOUNDCHECK(ZSTD_c_literalCompressionMode, (int)lcm);
         CCtxParams->literalCompressionMode =3D lcm;
         return CCtxParams->literalCompressionMode;
     }
@@ -789,47 +863,50 @@ size_t ZSTD_CCtxParams_setParameter(ZSTD_CCtx_params*=
 CCtxParams,
=20
     case ZSTD_c_enableDedicatedDictSearch :
         CCtxParams->enableDedicatedDictSearch =3D (value!=3D0);
-        return CCtxParams->enableDedicatedDictSearch;
+        return (size_t)CCtxParams->enableDedicatedDictSearch;
=20
     case ZSTD_c_enableLongDistanceMatching :
-        CCtxParams->ldmParams.enableLdm =3D (ZSTD_paramSwitch_e)value;
+        BOUNDCHECK(ZSTD_c_enableLongDistanceMatching, value);
+        CCtxParams->ldmParams.enableLdm =3D (ZSTD_ParamSwitch_e)value;
         return CCtxParams->ldmParams.enableLdm;
=20
     case ZSTD_c_ldmHashLog :
         if (value!=3D0)   /* 0 =3D=3D> auto */
             BOUNDCHECK(ZSTD_c_ldmHashLog, value);
-        CCtxParams->ldmParams.hashLog =3D value;
+        CCtxParams->ldmParams.hashLog =3D (U32)value;
         return CCtxParams->ldmParams.hashLog;
=20
     case ZSTD_c_ldmMinMatch :
         if (value!=3D0)   /* 0 =3D=3D> default */
             BOUNDCHECK(ZSTD_c_ldmMinMatch, value);
-        CCtxParams->ldmParams.minMatchLength =3D value;
+        CCtxParams->ldmParams.minMatchLength =3D (U32)value;
         return CCtxParams->ldmParams.minMatchLength;
=20
     case ZSTD_c_ldmBucketSizeLog :
         if (value!=3D0)   /* 0 =3D=3D> default */
             BOUNDCHECK(ZSTD_c_ldmBucketSizeLog, value);
-        CCtxParams->ldmParams.bucketSizeLog =3D value;
+        CCtxParams->ldmParams.bucketSizeLog =3D (U32)value;
         return CCtxParams->ldmParams.bucketSizeLog;
=20
     case ZSTD_c_ldmHashRateLog :
         if (value!=3D0)   /* 0 =3D=3D> default */
             BOUNDCHECK(ZSTD_c_ldmHashRateLog, value);
-        CCtxParams->ldmParams.hashRateLog =3D value;
+        CCtxParams->ldmParams.hashRateLog =3D (U32)value;
         return CCtxParams->ldmParams.hashRateLog;
=20
     case ZSTD_c_targetCBlockSize :
-        if (value!=3D0)   /* 0 =3D=3D> default */
+        if (value!=3D0) {  /* 0 =3D=3D> default */
+            value =3D MAX(value, ZSTD_TARGETCBLOCKSIZE_MIN);
             BOUNDCHECK(ZSTD_c_targetCBlockSize, value);
-        CCtxParams->targetCBlockSize =3D value;
+        }
+        CCtxParams->targetCBlockSize =3D (U32)value;
         return CCtxParams->targetCBlockSize;
=20
     case ZSTD_c_srcSizeHint :
         if (value!=3D0)    /* 0 =3D=3D> default */
             BOUNDCHECK(ZSTD_c_srcSizeHint, value);
         CCtxParams->srcSizeHint =3D value;
-        return CCtxParams->srcSizeHint;
+        return (size_t)CCtxParams->srcSizeHint;
=20
     case ZSTD_c_stableInBuffer:
         BOUNDCHECK(ZSTD_c_stableInBuffer, value);
@@ -843,28 +920,55 @@ size_t ZSTD_CCtxParams_setParameter(ZSTD_CCtx_params*=
 CCtxParams,
=20
     case ZSTD_c_blockDelimiters:
         BOUNDCHECK(ZSTD_c_blockDelimiters, value);
-        CCtxParams->blockDelimiters =3D (ZSTD_sequenceFormat_e)value;
+        CCtxParams->blockDelimiters =3D (ZSTD_SequenceFormat_e)value;
         return CCtxParams->blockDelimiters;
=20
     case ZSTD_c_validateSequences:
         BOUNDCHECK(ZSTD_c_validateSequences, value);
         CCtxParams->validateSequences =3D value;
-        return CCtxParams->validateSequences;
+        return (size_t)CCtxParams->validateSequences;
+
+    case ZSTD_c_splitAfterSequences:
+        BOUNDCHECK(ZSTD_c_splitAfterSequences, value);
+        CCtxParams->postBlockSplitter =3D (ZSTD_ParamSwitch_e)value;
+        return CCtxParams->postBlockSplitter;
=20
-    case ZSTD_c_useBlockSplitter:
-        BOUNDCHECK(ZSTD_c_useBlockSplitter, value);
-        CCtxParams->useBlockSplitter =3D (ZSTD_paramSwitch_e)value;
-        return CCtxParams->useBlockSplitter;
+    case ZSTD_c_blockSplitterLevel:
+        BOUNDCHECK(ZSTD_c_blockSplitterLevel, value);
+        CCtxParams->preBlockSplitter_level =3D value;
+        return (size_t)CCtxParams->preBlockSplitter_level;
=20
     case ZSTD_c_useRowMatchFinder:
         BOUNDCHECK(ZSTD_c_useRowMatchFinder, value);
-        CCtxParams->useRowMatchFinder =3D (ZSTD_paramSwitch_e)value;
+        CCtxParams->useRowMatchFinder =3D (ZSTD_ParamSwitch_e)value;
         return CCtxParams->useRowMatchFinder;
=20
     case ZSTD_c_deterministicRefPrefix:
         BOUNDCHECK(ZSTD_c_deterministicRefPrefix, value);
         CCtxParams->deterministicRefPrefix =3D !!value;
-        return CCtxParams->deterministicRefPrefix;
+        return (size_t)CCtxParams->deterministicRefPrefix;
+
+    case ZSTD_c_prefetchCDictTables:
+        BOUNDCHECK(ZSTD_c_prefetchCDictTables, value);
+        CCtxParams->prefetchCDictTables =3D (ZSTD_ParamSwitch_e)value;
+        return CCtxParams->prefetchCDictTables;
+
+    case ZSTD_c_enableSeqProducerFallback:
+        BOUNDCHECK(ZSTD_c_enableSeqProducerFallback, value);
+        CCtxParams->enableMatchFinderFallback =3D value;
+        return (size_t)CCtxParams->enableMatchFinderFallback;
+
+    case ZSTD_c_maxBlockSize:
+        if (value!=3D0)    /* 0 =3D=3D> default */
+            BOUNDCHECK(ZSTD_c_maxBlockSize, value);
+        assert(value>=3D0);
+        CCtxParams->maxBlockSize =3D (size_t)value;
+        return CCtxParams->maxBlockSize;
+
+    case ZSTD_c_repcodeResolution:
+        BOUNDCHECK(ZSTD_c_repcodeResolution, value);
+        CCtxParams->searchForExternalRepcodes =3D (ZSTD_ParamSwitch_e)valu=
e;
+        return CCtxParams->searchForExternalRepcodes;
=20
     default: RETURN_ERROR(parameter_unsupported, "unknown parameter");
     }
@@ -881,7 +985,7 @@ size_t ZSTD_CCtxParams_getParameter(
     switch(param)
     {
     case ZSTD_c_format :
-        *value =3D CCtxParams->format;
+        *value =3D (int)CCtxParams->format;
         break;
     case ZSTD_c_compressionLevel :
         *value =3D CCtxParams->compressionLevel;
@@ -896,16 +1000,16 @@ size_t ZSTD_CCtxParams_getParameter(
         *value =3D (int)CCtxParams->cParams.chainLog;
         break;
     case ZSTD_c_searchLog :
-        *value =3D CCtxParams->cParams.searchLog;
+        *value =3D (int)CCtxParams->cParams.searchLog;
         break;
     case ZSTD_c_minMatch :
-        *value =3D CCtxParams->cParams.minMatch;
+        *value =3D (int)CCtxParams->cParams.minMatch;
         break;
     case ZSTD_c_targetLength :
-        *value =3D CCtxParams->cParams.targetLength;
+        *value =3D (int)CCtxParams->cParams.targetLength;
         break;
     case ZSTD_c_strategy :
-        *value =3D (unsigned)CCtxParams->cParams.strategy;
+        *value =3D (int)CCtxParams->cParams.strategy;
         break;
     case ZSTD_c_contentSizeFlag :
         *value =3D CCtxParams->fParams.contentSizeFlag;
@@ -920,10 +1024,10 @@ size_t ZSTD_CCtxParams_getParameter(
         *value =3D CCtxParams->forceWindow;
         break;
     case ZSTD_c_forceAttachDict :
-        *value =3D CCtxParams->attachDictPref;
+        *value =3D (int)CCtxParams->attachDictPref;
         break;
     case ZSTD_c_literalCompressionMode :
-        *value =3D CCtxParams->literalCompressionMode;
+        *value =3D (int)CCtxParams->literalCompressionMode;
         break;
     case ZSTD_c_nbWorkers :
         assert(CCtxParams->nbWorkers =3D=3D 0);
@@ -939,19 +1043,19 @@ size_t ZSTD_CCtxParams_getParameter(
         *value =3D CCtxParams->enableDedicatedDictSearch;
         break;
     case ZSTD_c_enableLongDistanceMatching :
-        *value =3D CCtxParams->ldmParams.enableLdm;
+        *value =3D (int)CCtxParams->ldmParams.enableLdm;
         break;
     case ZSTD_c_ldmHashLog :
-        *value =3D CCtxParams->ldmParams.hashLog;
+        *value =3D (int)CCtxParams->ldmParams.hashLog;
         break;
     case ZSTD_c_ldmMinMatch :
-        *value =3D CCtxParams->ldmParams.minMatchLength;
+        *value =3D (int)CCtxParams->ldmParams.minMatchLength;
         break;
     case ZSTD_c_ldmBucketSizeLog :
-        *value =3D CCtxParams->ldmParams.bucketSizeLog;
+        *value =3D (int)CCtxParams->ldmParams.bucketSizeLog;
         break;
     case ZSTD_c_ldmHashRateLog :
-        *value =3D CCtxParams->ldmParams.hashRateLog;
+        *value =3D (int)CCtxParams->ldmParams.hashRateLog;
         break;
     case ZSTD_c_targetCBlockSize :
         *value =3D (int)CCtxParams->targetCBlockSize;
@@ -971,8 +1075,11 @@ size_t ZSTD_CCtxParams_getParameter(
     case ZSTD_c_validateSequences :
         *value =3D (int)CCtxParams->validateSequences;
         break;
-    case ZSTD_c_useBlockSplitter :
-        *value =3D (int)CCtxParams->useBlockSplitter;
+    case ZSTD_c_splitAfterSequences :
+        *value =3D (int)CCtxParams->postBlockSplitter;
+        break;
+    case ZSTD_c_blockSplitterLevel :
+        *value =3D CCtxParams->preBlockSplitter_level;
         break;
     case ZSTD_c_useRowMatchFinder :
         *value =3D (int)CCtxParams->useRowMatchFinder;
@@ -980,6 +1087,18 @@ size_t ZSTD_CCtxParams_getParameter(
     case ZSTD_c_deterministicRefPrefix:
         *value =3D (int)CCtxParams->deterministicRefPrefix;
         break;
+    case ZSTD_c_prefetchCDictTables:
+        *value =3D (int)CCtxParams->prefetchCDictTables;
+        break;
+    case ZSTD_c_enableSeqProducerFallback:
+        *value =3D CCtxParams->enableMatchFinderFallback;
+        break;
+    case ZSTD_c_maxBlockSize:
+        *value =3D (int)CCtxParams->maxBlockSize;
+        break;
+    case ZSTD_c_repcodeResolution:
+        *value =3D (int)CCtxParams->searchForExternalRepcodes;
+        break;
     default: RETURN_ERROR(parameter_unsupported, "unknown parameter");
     }
     return 0;
@@ -1006,9 +1125,47 @@ size_t ZSTD_CCtx_setParametersUsingCCtxParams(
     return 0;
 }
=20
+size_t ZSTD_CCtx_setCParams(ZSTD_CCtx* cctx, ZSTD_compressionParameters cp=
arams)
+{
+    ZSTD_STATIC_ASSERT(sizeof(cparams) =3D=3D 7 * 4 /* all params are list=
ed below */);
+    DEBUGLOG(4, "ZSTD_CCtx_setCParams");
+    /* only update if all parameters are valid */
+    FORWARD_IF_ERROR(ZSTD_checkCParams(cparams), "");
+    FORWARD_IF_ERROR(ZSTD_CCtx_setParameter(cctx, ZSTD_c_windowLog, (int)c=
params.windowLog), "");
+    FORWARD_IF_ERROR(ZSTD_CCtx_setParameter(cctx, ZSTD_c_chainLog, (int)cp=
arams.chainLog), "");
+    FORWARD_IF_ERROR(ZSTD_CCtx_setParameter(cctx, ZSTD_c_hashLog, (int)cpa=
rams.hashLog), "");
+    FORWARD_IF_ERROR(ZSTD_CCtx_setParameter(cctx, ZSTD_c_searchLog, (int)c=
params.searchLog), "");
+    FORWARD_IF_ERROR(ZSTD_CCtx_setParameter(cctx, ZSTD_c_minMatch, (int)cp=
arams.minMatch), "");
+    FORWARD_IF_ERROR(ZSTD_CCtx_setParameter(cctx, ZSTD_c_targetLength, (in=
t)cparams.targetLength), "");
+    FORWARD_IF_ERROR(ZSTD_CCtx_setParameter(cctx, ZSTD_c_strategy, (int)cp=
arams.strategy), "");
+    return 0;
+}
+
+size_t ZSTD_CCtx_setFParams(ZSTD_CCtx* cctx, ZSTD_frameParameters fparams)
+{
+    ZSTD_STATIC_ASSERT(sizeof(fparams) =3D=3D 3 * 4 /* all params are list=
ed below */);
+    DEBUGLOG(4, "ZSTD_CCtx_setFParams");
+    FORWARD_IF_ERROR(ZSTD_CCtx_setParameter(cctx, ZSTD_c_contentSizeFlag, =
fparams.contentSizeFlag !=3D 0), "");
+    FORWARD_IF_ERROR(ZSTD_CCtx_setParameter(cctx, ZSTD_c_checksumFlag, fpa=
rams.checksumFlag !=3D 0), "");
+    FORWARD_IF_ERROR(ZSTD_CCtx_setParameter(cctx, ZSTD_c_dictIDFlag, fpara=
ms.noDictIDFlag =3D=3D 0), "");
+    return 0;
+}
+
+size_t ZSTD_CCtx_setParams(ZSTD_CCtx* cctx, ZSTD_parameters params)
+{
+    DEBUGLOG(4, "ZSTD_CCtx_setParams");
+    /* First check cParams, because we want to update all or none. */
+    FORWARD_IF_ERROR(ZSTD_checkCParams(params.cParams), "");
+    /* Next set fParams, because this could fail if the cctx isn't in init=
 stage. */
+    FORWARD_IF_ERROR(ZSTD_CCtx_setFParams(cctx, params.fParams), "");
+    /* Finally set cParams, which should succeed. */
+    FORWARD_IF_ERROR(ZSTD_CCtx_setCParams(cctx, params.cParams), "");
+    return 0;
+}
+
 size_t ZSTD_CCtx_setPledgedSrcSize(ZSTD_CCtx* cctx, unsigned long long ple=
dgedSrcSize)
 {
-    DEBUGLOG(4, "ZSTD_CCtx_setPledgedSrcSize to %u bytes", (U32)pledgedSrc=
Size);
+    DEBUGLOG(4, "ZSTD_CCtx_setPledgedSrcSize to %llu bytes", pledgedSrcSiz=
e);
     RETURN_ERROR_IF(cctx->streamStage !=3D zcss_init, stage_wrong,
                     "Can't set pledgedSrcSize when not in init stage.");
     cctx->pledgedSrcSizePlusOne =3D pledgedSrcSize+1;
@@ -1024,9 +1181,9 @@ static void ZSTD_dedicatedDictSearch_revertCParams(
         ZSTD_compressionParameters* cParams);
=20
 /*
- * Initializes the local dict using the requested parameters.
- * NOTE: This does not use the pledged src size, because it may be used fo=
r more
- * than one compression.
+ * Initializes the local dictionary using requested parameters.
+ * NOTE: Initialization does not employ the pledged src size,
+ * because the dictionary may be used for multiple compressions.
  */
 static size_t ZSTD_initLocalDict(ZSTD_CCtx* cctx)
 {
@@ -1039,8 +1196,8 @@ static size_t ZSTD_initLocalDict(ZSTD_CCtx* cctx)
         return 0;
     }
     if (dl->cdict !=3D NULL) {
-        assert(cctx->cdict =3D=3D dl->cdict);
         /* Local dictionary already initialized. */
+        assert(cctx->cdict =3D=3D dl->cdict);
         return 0;
     }
     assert(dl->dictSize > 0);
@@ -1060,26 +1217,30 @@ static size_t ZSTD_initLocalDict(ZSTD_CCtx* cctx)
 }
=20
 size_t ZSTD_CCtx_loadDictionary_advanced(
-        ZSTD_CCtx* cctx, const void* dict, size_t dictSize,
-        ZSTD_dictLoadMethod_e dictLoadMethod, ZSTD_dictContentType_e dictC=
ontentType)
+        ZSTD_CCtx* cctx,
+        const void* dict, size_t dictSize,
+        ZSTD_dictLoadMethod_e dictLoadMethod,
+        ZSTD_dictContentType_e dictContentType)
 {
-    RETURN_ERROR_IF(cctx->streamStage !=3D zcss_init, stage_wrong,
-                    "Can't load a dictionary when ctx is not in init stage=
.");
     DEBUGLOG(4, "ZSTD_CCtx_loadDictionary_advanced (size: %u)", (U32)dictS=
ize);
-    ZSTD_clearAllDicts(cctx);  /* in case one already exists */
-    if (dict =3D=3D NULL || dictSize =3D=3D 0)  /* no dictionary mode */
+    RETURN_ERROR_IF(cctx->streamStage !=3D zcss_init, stage_wrong,
+                    "Can't load a dictionary when cctx is not in init stag=
e.");
+    ZSTD_clearAllDicts(cctx);  /* erase any previously set dictionary */
+    if (dict =3D=3D NULL || dictSize =3D=3D 0)  /* no dictionary */
         return 0;
     if (dictLoadMethod =3D=3D ZSTD_dlm_byRef) {
         cctx->localDict.dict =3D dict;
     } else {
+        /* copy dictionary content inside CCtx to own its lifetime */
         void* dictBuffer;
         RETURN_ERROR_IF(cctx->staticSize, memory_allocation,
-                        "no malloc for static CCtx");
+                        "static CCtx can't allocate for an internal copy o=
f dictionary");
         dictBuffer =3D ZSTD_customMalloc(dictSize, cctx->customMem);
-        RETURN_ERROR_IF(!dictBuffer, memory_allocation, "NULL pointer!");
+        RETURN_ERROR_IF(dictBuffer=3D=3DNULL, memory_allocation,
+                        "allocation failed for dictionary content");
         ZSTD_memcpy(dictBuffer, dict, dictSize);
-        cctx->localDict.dictBuffer =3D dictBuffer;
-        cctx->localDict.dict =3D dictBuffer;
+        cctx->localDict.dictBuffer =3D dictBuffer;  /* owned ptr to free */
+        cctx->localDict.dict =3D dictBuffer;        /* read-only reference=
 */
     }
     cctx->localDict.dictSize =3D dictSize;
     cctx->localDict.dictContentType =3D dictContentType;
@@ -1149,7 +1310,7 @@ size_t ZSTD_CCtx_reset(ZSTD_CCtx* cctx, ZSTD_ResetDir=
ective reset)
     if ( (reset =3D=3D ZSTD_reset_parameters)
       || (reset =3D=3D ZSTD_reset_session_and_parameters) ) {
         RETURN_ERROR_IF(cctx->streamStage !=3D zcss_init, stage_wrong,
-                        "Can't reset parameters only when not in init stag=
e.");
+                        "Reset parameters is only possible during init sta=
ge.");
         ZSTD_clearAllDicts(cctx);
         return ZSTD_CCtxParams_reset(&cctx->requestedParams);
     }
@@ -1168,7 +1329,7 @@ size_t ZSTD_checkCParams(ZSTD_compressionParameters c=
Params)
     BOUNDCHECK(ZSTD_c_searchLog, (int)cParams.searchLog);
     BOUNDCHECK(ZSTD_c_minMatch,  (int)cParams.minMatch);
     BOUNDCHECK(ZSTD_c_targetLength,(int)cParams.targetLength);
-    BOUNDCHECK(ZSTD_c_strategy,  cParams.strategy);
+    BOUNDCHECK(ZSTD_c_strategy,  (int)cParams.strategy);
     return 0;
 }
=20
@@ -1178,11 +1339,12 @@ size_t ZSTD_checkCParams(ZSTD_compressionParameters=
 cParams)
 static ZSTD_compressionParameters
 ZSTD_clampCParams(ZSTD_compressionParameters cParams)
 {
-#   define CLAMP_TYPE(cParam, val, type) {                                \
-        ZSTD_bounds const bounds =3D ZSTD_cParam_getBounds(cParam);       =
  \
-        if ((int)val<bounds.lowerBound) val=3D(type)bounds.lowerBound;    =
  \
-        else if ((int)val>bounds.upperBound) val=3D(type)bounds.upperBound=
; \
-    }
+#   define CLAMP_TYPE(cParam, val, type)                                  =
    \
+        do {                                                              =
    \
+            ZSTD_bounds const bounds =3D ZSTD_cParam_getBounds(cParam);   =
      \
+            if ((int)val<bounds.lowerBound) val=3D(type)bounds.lowerBound;=
      \
+            else if ((int)val>bounds.upperBound) val=3D(type)bounds.upperB=
ound; \
+        } while (0)
 #   define CLAMP(cParam, val) CLAMP_TYPE(cParam, val, unsigned)
     CLAMP(ZSTD_c_windowLog, cParams.windowLog);
     CLAMP(ZSTD_c_chainLog,  cParams.chainLog);
@@ -1240,19 +1402,62 @@ static U32 ZSTD_dictAndWindowLog(U32 windowLog, U64=
 srcSize, U64 dictSize)
  *  optimize `cPar` for a specified input (`srcSize` and `dictSize`).
  *  mostly downsize to reduce memory consumption and initialization latenc=
y.
  * `srcSize` can be ZSTD_CONTENTSIZE_UNKNOWN when not known.
- * `mode` is the mode for parameter adjustment. See docs for `ZSTD_cParamM=
ode_e`.
+ * `mode` is the mode for parameter adjustment. See docs for `ZSTD_CParamM=
ode_e`.
  *  note : `srcSize=3D=3D0` means 0!
  *  condition : cPar is presumed validated (can be checked using ZSTD_chec=
kCParams()). */
 static ZSTD_compressionParameters
 ZSTD_adjustCParams_internal(ZSTD_compressionParameters cPar,
                             unsigned long long srcSize,
                             size_t dictSize,
-                            ZSTD_cParamMode_e mode)
+                            ZSTD_CParamMode_e mode,
+                            ZSTD_ParamSwitch_e useRowMatchFinder)
 {
     const U64 minSrcSize =3D 513; /* (1<<9) + 1 */
     const U64 maxWindowResize =3D 1ULL << (ZSTD_WINDOWLOG_MAX-1);
     assert(ZSTD_checkCParams(cPar)=3D=3D0);
=20
+    /* Cascade the selected strategy down to the next-highest one built in=
to
+     * this binary. */
+#ifdef ZSTD_EXCLUDE_BTULTRA_BLOCK_COMPRESSOR
+    if (cPar.strategy =3D=3D ZSTD_btultra2) {
+        cPar.strategy =3D ZSTD_btultra;
+    }
+    if (cPar.strategy =3D=3D ZSTD_btultra) {
+        cPar.strategy =3D ZSTD_btopt;
+    }
+#endif
+#ifdef ZSTD_EXCLUDE_BTOPT_BLOCK_COMPRESSOR
+    if (cPar.strategy =3D=3D ZSTD_btopt) {
+        cPar.strategy =3D ZSTD_btlazy2;
+    }
+#endif
+#ifdef ZSTD_EXCLUDE_BTLAZY2_BLOCK_COMPRESSOR
+    if (cPar.strategy =3D=3D ZSTD_btlazy2) {
+        cPar.strategy =3D ZSTD_lazy2;
+    }
+#endif
+#ifdef ZSTD_EXCLUDE_LAZY2_BLOCK_COMPRESSOR
+    if (cPar.strategy =3D=3D ZSTD_lazy2) {
+        cPar.strategy =3D ZSTD_lazy;
+    }
+#endif
+#ifdef ZSTD_EXCLUDE_LAZY_BLOCK_COMPRESSOR
+    if (cPar.strategy =3D=3D ZSTD_lazy) {
+        cPar.strategy =3D ZSTD_greedy;
+    }
+#endif
+#ifdef ZSTD_EXCLUDE_GREEDY_BLOCK_COMPRESSOR
+    if (cPar.strategy =3D=3D ZSTD_greedy) {
+        cPar.strategy =3D ZSTD_dfast;
+    }
+#endif
+#ifdef ZSTD_EXCLUDE_DFAST_BLOCK_COMPRESSOR
+    if (cPar.strategy =3D=3D ZSTD_dfast) {
+        cPar.strategy =3D ZSTD_fast;
+        cPar.targetLength =3D 0;
+    }
+#endif
+
     switch (mode) {
     case ZSTD_cpm_unknown:
     case ZSTD_cpm_noAttachDict:
@@ -1281,8 +1486,8 @@ ZSTD_adjustCParams_internal(ZSTD_compressionParameter=
s cPar,
     }
=20
     /* resize windowLog if input is small enough, to use less memory */
-    if ( (srcSize < maxWindowResize)
-      && (dictSize < maxWindowResize) )  {
+    if ( (srcSize <=3D maxWindowResize)
+      && (dictSize <=3D maxWindowResize) )  {
         U32 const tSize =3D (U32)(srcSize + dictSize);
         static U32 const hashSizeMin =3D 1 << ZSTD_HASHLOG_MIN;
         U32 const srcLog =3D (tSize < hashSizeMin) ? ZSTD_HASHLOG_MIN :
@@ -1300,6 +1505,42 @@ ZSTD_adjustCParams_internal(ZSTD_compressionParamete=
rs cPar,
     if (cPar.windowLog < ZSTD_WINDOWLOG_ABSOLUTEMIN)
         cPar.windowLog =3D ZSTD_WINDOWLOG_ABSOLUTEMIN;  /* minimum wlog re=
quired for valid frame header */
=20
+    /* We can't use more than 32 bits of hash in total, so that means that=
 we require:
+     * (hashLog + 8) <=3D 32 && (chainLog + 8) <=3D 32
+     */
+    if (mode =3D=3D ZSTD_cpm_createCDict && ZSTD_CDictIndicesAreTagged(&cP=
ar)) {
+        U32 const maxShortCacheHashLog =3D 32 - ZSTD_SHORT_CACHE_TAG_BITS;
+        if (cPar.hashLog > maxShortCacheHashLog) {
+            cPar.hashLog =3D maxShortCacheHashLog;
+        }
+        if (cPar.chainLog > maxShortCacheHashLog) {
+            cPar.chainLog =3D maxShortCacheHashLog;
+        }
+    }
+
+
+    /* At this point, we aren't 100% sure if we are using the row match fi=
nder.
+     * Unless it is explicitly disabled, conservatively assume that it is =
enabled.
+     * In this case it will only be disabled for small sources, so shrinki=
ng the
+     * hash log a little bit shouldn't result in any ratio loss.
+     */
+    if (useRowMatchFinder =3D=3D ZSTD_ps_auto)
+        useRowMatchFinder =3D ZSTD_ps_enable;
+
+    /* We can't hash more than 32-bits in total. So that means that we req=
uire:
+     * (hashLog - rowLog + 8) <=3D 32
+     */
+    if (ZSTD_rowMatchFinderUsed(cPar.strategy, useRowMatchFinder)) {
+        /* Switch to 32-entry rows if searchLog is 5 (or more) */
+        U32 const rowLog =3D BOUNDED(4, cPar.searchLog, 6);
+        U32 const maxRowHashLog =3D 32 - ZSTD_ROW_HASH_TAG_BITS;
+        U32 const maxHashLog =3D maxRowHashLog + rowLog;
+        assert(cPar.hashLog >=3D rowLog);
+        if (cPar.hashLog > maxHashLog) {
+            cPar.hashLog =3D maxHashLog;
+        }
+    }
+
     return cPar;
 }
=20
@@ -1310,11 +1551,11 @@ ZSTD_adjustCParams(ZSTD_compressionParameters cPar,
 {
     cPar =3D ZSTD_clampCParams(cPar);   /* resulting cPar is necessarily v=
alid (all parameters within range) */
     if (srcSize =3D=3D 0) srcSize =3D ZSTD_CONTENTSIZE_UNKNOWN;
-    return ZSTD_adjustCParams_internal(cPar, srcSize, dictSize, ZSTD_cpm_u=
nknown);
+    return ZSTD_adjustCParams_internal(cPar, srcSize, dictSize, ZSTD_cpm_u=
nknown, ZSTD_ps_auto);
 }
=20
-static ZSTD_compressionParameters ZSTD_getCParams_internal(int compression=
Level, unsigned long long srcSizeHint, size_t dictSize, ZSTD_cParamMode_e m=
ode);
-static ZSTD_parameters ZSTD_getParams_internal(int compressionLevel, unsig=
ned long long srcSizeHint, size_t dictSize, ZSTD_cParamMode_e mode);
+static ZSTD_compressionParameters ZSTD_getCParams_internal(int compression=
Level, unsigned long long srcSizeHint, size_t dictSize, ZSTD_CParamMode_e m=
ode);
+static ZSTD_parameters ZSTD_getParams_internal(int compressionLevel, unsig=
ned long long srcSizeHint, size_t dictSize, ZSTD_CParamMode_e mode);
=20
 static void ZSTD_overrideCParams(
               ZSTD_compressionParameters* cParams,
@@ -1330,24 +1571,25 @@ static void ZSTD_overrideCParams(
 }
=20
 ZSTD_compressionParameters ZSTD_getCParamsFromCCtxParams(
-        const ZSTD_CCtx_params* CCtxParams, U64 srcSizeHint, size_t dictSi=
ze, ZSTD_cParamMode_e mode)
+        const ZSTD_CCtx_params* CCtxParams, U64 srcSizeHint, size_t dictSi=
ze, ZSTD_CParamMode_e mode)
 {
     ZSTD_compressionParameters cParams;
     if (srcSizeHint =3D=3D ZSTD_CONTENTSIZE_UNKNOWN && CCtxParams->srcSize=
Hint > 0) {
-      srcSizeHint =3D CCtxParams->srcSizeHint;
+        assert(CCtxParams->srcSizeHint>=3D0);
+        srcSizeHint =3D (U64)CCtxParams->srcSizeHint;
     }
     cParams =3D ZSTD_getCParams_internal(CCtxParams->compressionLevel, src=
SizeHint, dictSize, mode);
     if (CCtxParams->ldmParams.enableLdm =3D=3D ZSTD_ps_enable) cParams.win=
dowLog =3D ZSTD_LDM_DEFAULT_WINDOW_LOG;
     ZSTD_overrideCParams(&cParams, &CCtxParams->cParams);
     assert(!ZSTD_checkCParams(cParams));
     /* srcSizeHint =3D=3D 0 means 0 */
-    return ZSTD_adjustCParams_internal(cParams, srcSizeHint, dictSize, mod=
e);
+    return ZSTD_adjustCParams_internal(cParams, srcSizeHint, dictSize, mod=
e, CCtxParams->useRowMatchFinder);
 }
=20
 static size_t
 ZSTD_sizeof_matchState(const ZSTD_compressionParameters* const cParams,
-                       const ZSTD_paramSwitch_e useRowMatchFinder,
-                       const U32 enableDedicatedDictSearch,
+                       const ZSTD_ParamSwitch_e useRowMatchFinder,
+                       const int enableDedicatedDictSearch,
                        const U32 forCCtx)
 {
     /* chain table size should be 0 for fast or row-hash strategies */
@@ -1363,14 +1605,14 @@ ZSTD_sizeof_matchState(const ZSTD_compressionParame=
ters* const cParams,
                             + hSize * sizeof(U32)
                             + h3Size * sizeof(U32);
     size_t const optPotentialSpace =3D
-        ZSTD_cwksp_aligned_alloc_size((MaxML+1) * sizeof(U32))
-      + ZSTD_cwksp_aligned_alloc_size((MaxLL+1) * sizeof(U32))
-      + ZSTD_cwksp_aligned_alloc_size((MaxOff+1) * sizeof(U32))
-      + ZSTD_cwksp_aligned_alloc_size((1<<Litbits) * sizeof(U32))
-      + ZSTD_cwksp_aligned_alloc_size((ZSTD_OPT_NUM+1) * sizeof(ZSTD_match=
_t))
-      + ZSTD_cwksp_aligned_alloc_size((ZSTD_OPT_NUM+1) * sizeof(ZSTD_optim=
al_t));
+        ZSTD_cwksp_aligned64_alloc_size((MaxML+1) * sizeof(U32))
+      + ZSTD_cwksp_aligned64_alloc_size((MaxLL+1) * sizeof(U32))
+      + ZSTD_cwksp_aligned64_alloc_size((MaxOff+1) * sizeof(U32))
+      + ZSTD_cwksp_aligned64_alloc_size((1<<Litbits) * sizeof(U32))
+      + ZSTD_cwksp_aligned64_alloc_size(ZSTD_OPT_SIZE * sizeof(ZSTD_match_=
t))
+      + ZSTD_cwksp_aligned64_alloc_size(ZSTD_OPT_SIZE * sizeof(ZSTD_optima=
l_t));
     size_t const lazyAdditionalSpace =3D ZSTD_rowMatchFinderUsed(cParams->=
strategy, useRowMatchFinder)
-                                            ? ZSTD_cwksp_aligned_alloc_siz=
e(hSize*sizeof(U16))
+                                            ? ZSTD_cwksp_aligned64_alloc_s=
ize(hSize)
                                             : 0;
     size_t const optSpace =3D (forCCtx && (cParams->strategy >=3D ZSTD_bto=
pt))
                                 ? optPotentialSpace
@@ -1386,30 +1628,38 @@ ZSTD_sizeof_matchState(const ZSTD_compressionParame=
ters* const cParams,
     return tableSpace + optSpace + slackSpace + lazyAdditionalSpace;
 }
=20
+/* Helper function for calculating memory requirements.
+ * Gives a tighter bound than ZSTD_sequenceBound() by taking minMatch into=
 account. */
+static size_t ZSTD_maxNbSeq(size_t blockSize, unsigned minMatch, int useSe=
quenceProducer) {
+    U32 const divider =3D (minMatch=3D=3D3 || useSequenceProducer) ? 3 : 4;
+    return blockSize / divider;
+}
+
 static size_t ZSTD_estimateCCtxSize_usingCCtxParams_internal(
         const ZSTD_compressionParameters* cParams,
         const ldmParams_t* ldmParams,
         const int isStatic,
-        const ZSTD_paramSwitch_e useRowMatchFinder,
+        const ZSTD_ParamSwitch_e useRowMatchFinder,
         const size_t buffInSize,
         const size_t buffOutSize,
-        const U64 pledgedSrcSize)
+        const U64 pledgedSrcSize,
+        int useSequenceProducer,
+        size_t maxBlockSize)
 {
     size_t const windowSize =3D (size_t) BOUNDED(1ULL, 1ULL << cParams->wi=
ndowLog, pledgedSrcSize);
-    size_t const blockSize =3D MIN(ZSTD_BLOCKSIZE_MAX, windowSize);
-    U32    const divider =3D (cParams->minMatch=3D=3D3) ? 3 : 4;
-    size_t const maxNbSeq =3D blockSize / divider;
+    size_t const blockSize =3D MIN(ZSTD_resolveMaxBlockSize(maxBlockSize),=
 windowSize);
+    size_t const maxNbSeq =3D ZSTD_maxNbSeq(blockSize, cParams->minMatch, =
useSequenceProducer);
     size_t const tokenSpace =3D ZSTD_cwksp_alloc_size(WILDCOPY_OVERLENGTH =
+ blockSize)
-                            + ZSTD_cwksp_aligned_alloc_size(maxNbSeq * siz=
eof(seqDef))
+                            + ZSTD_cwksp_aligned64_alloc_size(maxNbSeq * s=
izeof(SeqDef))
                             + 3 * ZSTD_cwksp_alloc_size(maxNbSeq * sizeof(=
BYTE));
-    size_t const entropySpace =3D ZSTD_cwksp_alloc_size(ENTROPY_WORKSPACE_=
SIZE);
+    size_t const tmpWorkSpace =3D ZSTD_cwksp_alloc_size(TMP_WORKSPACE_SIZE=
);
     size_t const blockStateSpace =3D 2 * ZSTD_cwksp_alloc_size(sizeof(ZSTD=
_compressedBlockState_t));
     size_t const matchStateSize =3D ZSTD_sizeof_matchState(cParams, useRow=
MatchFinder, /* enableDedicatedDictSearch */ 0, /* forCCtx */ 1);
=20
     size_t const ldmSpace =3D ZSTD_ldm_getTableSize(*ldmParams);
     size_t const maxNbLdmSeq =3D ZSTD_ldm_getMaxNbSeq(*ldmParams, blockSiz=
e);
     size_t const ldmSeqSpace =3D ldmParams->enableLdm =3D=3D ZSTD_ps_enabl=
e ?
-        ZSTD_cwksp_aligned_alloc_size(maxNbLdmSeq * sizeof(rawSeq)) : 0;
+        ZSTD_cwksp_aligned64_alloc_size(maxNbLdmSeq * sizeof(rawSeq)) : 0;
=20
=20
     size_t const bufferSpace =3D ZSTD_cwksp_alloc_size(buffInSize)
@@ -1417,15 +1667,21 @@ static size_t ZSTD_estimateCCtxSize_usingCCtxParams=
_internal(
=20
     size_t const cctxSpace =3D isStatic ? ZSTD_cwksp_alloc_size(sizeof(ZST=
D_CCtx)) : 0;
=20
+    size_t const maxNbExternalSeq =3D ZSTD_sequenceBound(blockSize);
+    size_t const externalSeqSpace =3D useSequenceProducer
+        ? ZSTD_cwksp_aligned64_alloc_size(maxNbExternalSeq * sizeof(ZSTD_S=
equence))
+        : 0;
+
     size_t const neededSpace =3D
         cctxSpace +
-        entropySpace +
+        tmpWorkSpace +
         blockStateSpace +
         ldmSpace +
         ldmSeqSpace +
         matchStateSize +
         tokenSpace +
-        bufferSpace;
+        bufferSpace +
+        externalSeqSpace;
=20
     DEBUGLOG(5, "estimate workspace : %u", (U32)neededSpace);
     return neededSpace;
@@ -1435,7 +1691,7 @@ size_t ZSTD_estimateCCtxSize_usingCCtxParams(const ZS=
TD_CCtx_params* params)
 {
     ZSTD_compressionParameters const cParams =3D
                 ZSTD_getCParamsFromCCtxParams(params, ZSTD_CONTENTSIZE_UNK=
NOWN, 0, ZSTD_cpm_noAttachDict);
-    ZSTD_paramSwitch_e const useRowMatchFinder =3D ZSTD_resolveRowMatchFin=
derMode(params->useRowMatchFinder,
+    ZSTD_ParamSwitch_e const useRowMatchFinder =3D ZSTD_resolveRowMatchFin=
derMode(params->useRowMatchFinder,
                                                                           =
     &cParams);
=20
     RETURN_ERROR_IF(params->nbWorkers > 0, GENERIC, "Estimate CCtx size is=
 supported for single-threaded compression only.");
@@ -1443,7 +1699,7 @@ size_t ZSTD_estimateCCtxSize_usingCCtxParams(const ZS=
TD_CCtx_params* params)
      * be needed. However, we still allocate two 0-sized buffers, which can
      * take space under ASAN. */
     return ZSTD_estimateCCtxSize_usingCCtxParams_internal(
-        &cParams, &params->ldmParams, 1, useRowMatchFinder, 0, 0, ZSTD_CON=
TENTSIZE_UNKNOWN);
+        &cParams, &params->ldmParams, 1, useRowMatchFinder, 0, 0, ZSTD_CON=
TENTSIZE_UNKNOWN, ZSTD_hasExtSeqProd(params), params->maxBlockSize);
 }
=20
 size_t ZSTD_estimateCCtxSize_usingCParams(ZSTD_compressionParameters cPara=
ms)
@@ -1493,18 +1749,18 @@ size_t ZSTD_estimateCStreamSize_usingCCtxParams(con=
st ZSTD_CCtx_params* params)
     RETURN_ERROR_IF(params->nbWorkers > 0, GENERIC, "Estimate CCtx size is=
 supported for single-threaded compression only.");
     {   ZSTD_compressionParameters const cParams =3D
                 ZSTD_getCParamsFromCCtxParams(params, ZSTD_CONTENTSIZE_UNK=
NOWN, 0, ZSTD_cpm_noAttachDict);
-        size_t const blockSize =3D MIN(ZSTD_BLOCKSIZE_MAX, (size_t)1 << cP=
arams.windowLog);
+        size_t const blockSize =3D MIN(ZSTD_resolveMaxBlockSize(params->ma=
xBlockSize), (size_t)1 << cParams.windowLog);
         size_t const inBuffSize =3D (params->inBufferMode =3D=3D ZSTD_bm_b=
uffered)
                 ? ((size_t)1 << cParams.windowLog) + blockSize
                 : 0;
         size_t const outBuffSize =3D (params->outBufferMode =3D=3D ZSTD_bm=
_buffered)
                 ? ZSTD_compressBound(blockSize) + 1
                 : 0;
-        ZSTD_paramSwitch_e const useRowMatchFinder =3D ZSTD_resolveRowMatc=
hFinderMode(params->useRowMatchFinder, &params->cParams);
+        ZSTD_ParamSwitch_e const useRowMatchFinder =3D ZSTD_resolveRowMatc=
hFinderMode(params->useRowMatchFinder, &params->cParams);
=20
         return ZSTD_estimateCCtxSize_usingCCtxParams_internal(
             &cParams, &params->ldmParams, 1, useRowMatchFinder, inBuffSize=
, outBuffSize,
-            ZSTD_CONTENTSIZE_UNKNOWN);
+            ZSTD_CONTENTSIZE_UNKNOWN, ZSTD_hasExtSeqProd(params), params->=
maxBlockSize);
     }
 }
=20
@@ -1600,7 +1856,7 @@ void ZSTD_reset_compressedBlockState(ZSTD_compressedB=
lockState_t* bs)
  *  Invalidate all the matches in the match finder tables.
  *  Requires nextSrc and base to be set (can be NULL).
  */
-static void ZSTD_invalidateMatchState(ZSTD_matchState_t* ms)
+static void ZSTD_invalidateMatchState(ZSTD_MatchState_t* ms)
 {
     ZSTD_window_clear(&ms->window);
=20
@@ -1637,12 +1893,25 @@ typedef enum {
     ZSTD_resetTarget_CCtx
 } ZSTD_resetTarget_e;
=20
+/* Mixes bits in a 64 bits in a value, based on XXH3_rrmxmx */
+static U64 ZSTD_bitmix(U64 val, U64 len) {
+    val ^=3D ZSTD_rotateRight_U64(val, 49) ^ ZSTD_rotateRight_U64(val, 24);
+    val *=3D 0x9FB21C651E98DF25ULL;
+    val ^=3D (val >> 35) + len ;
+    val *=3D 0x9FB21C651E98DF25ULL;
+    return val ^ (val >> 28);
+}
+
+/* Mixes in the hashSalt and hashSaltEntropy to create a new hashSalt */
+static void ZSTD_advanceHashSalt(ZSTD_MatchState_t* ms) {
+    ms->hashSalt =3D ZSTD_bitmix(ms->hashSalt, 8) ^ ZSTD_bitmix((U64) ms->=
hashSaltEntropy, 4);
+}
=20
 static size_t
-ZSTD_reset_matchState(ZSTD_matchState_t* ms,
+ZSTD_reset_matchState(ZSTD_MatchState_t* ms,
                       ZSTD_cwksp* ws,
                 const ZSTD_compressionParameters* cParams,
-                const ZSTD_paramSwitch_e useRowMatchFinder,
+                const ZSTD_ParamSwitch_e useRowMatchFinder,
                 const ZSTD_compResetPolicy_e crp,
                 const ZSTD_indexResetPolicy_e forceResetIndex,
                 const ZSTD_resetTarget_e forWho)
@@ -1664,6 +1933,7 @@ ZSTD_reset_matchState(ZSTD_matchState_t* ms,
     }
=20
     ms->hashLog3 =3D hashLog3;
+    ms->lazySkipping =3D 0;
=20
     ZSTD_invalidateMatchState(ms);
=20
@@ -1685,22 +1955,19 @@ ZSTD_reset_matchState(ZSTD_matchState_t* ms,
         ZSTD_cwksp_clean_tables(ws);
     }
=20
-    /* opt parser space */
-    if ((forWho =3D=3D ZSTD_resetTarget_CCtx) && (cParams->strategy >=3D Z=
STD_btopt)) {
-        DEBUGLOG(4, "reserving optimal parser space");
-        ms->opt.litFreq =3D (unsigned*)ZSTD_cwksp_reserve_aligned(ws, (1<<=
Litbits) * sizeof(unsigned));
-        ms->opt.litLengthFreq =3D (unsigned*)ZSTD_cwksp_reserve_aligned(ws=
, (MaxLL+1) * sizeof(unsigned));
-        ms->opt.matchLengthFreq =3D (unsigned*)ZSTD_cwksp_reserve_aligned(=
ws, (MaxML+1) * sizeof(unsigned));
-        ms->opt.offCodeFreq =3D (unsigned*)ZSTD_cwksp_reserve_aligned(ws, =
(MaxOff+1) * sizeof(unsigned));
-        ms->opt.matchTable =3D (ZSTD_match_t*)ZSTD_cwksp_reserve_aligned(w=
s, (ZSTD_OPT_NUM+1) * sizeof(ZSTD_match_t));
-        ms->opt.priceTable =3D (ZSTD_optimal_t*)ZSTD_cwksp_reserve_aligned=
(ws, (ZSTD_OPT_NUM+1) * sizeof(ZSTD_optimal_t));
-    }
-
     if (ZSTD_rowMatchFinderUsed(cParams->strategy, useRowMatchFinder)) {
-        {   /* Row match finder needs an additional table of hashes ("tags=
") */
-            size_t const tagTableSize =3D hSize*sizeof(U16);
-            ms->tagTable =3D (U16*)ZSTD_cwksp_reserve_aligned(ws, tagTable=
Size);
-            if (ms->tagTable) ZSTD_memset(ms->tagTable, 0, tagTableSize);
+        /* Row match finder needs an additional table of hashes ("tags") */
+        size_t const tagTableSize =3D hSize;
+        /* We want to generate a new salt in case we reset a Cctx, but we =
always want to use
+         * 0 when we reset a Cdict */
+        if(forWho =3D=3D ZSTD_resetTarget_CCtx) {
+            ms->tagTable =3D (BYTE*) ZSTD_cwksp_reserve_aligned_init_once(=
ws, tagTableSize);
+            ZSTD_advanceHashSalt(ms);
+        } else {
+            /* When we are not salting we want to always memset the memory=
 */
+            ms->tagTable =3D (BYTE*) ZSTD_cwksp_reserve_aligned64(ws, tagT=
ableSize);
+            ZSTD_memset(ms->tagTable, 0, tagTableSize);
+            ms->hashSalt =3D 0;
         }
         {   /* Switch to 32-entry rows if searchLog is 5 (or more) */
             U32 const rowLog =3D BOUNDED(4, cParams->searchLog, 6);
@@ -1709,6 +1976,17 @@ ZSTD_reset_matchState(ZSTD_matchState_t* ms,
         }
     }
=20
+    /* opt parser space */
+    if ((forWho =3D=3D ZSTD_resetTarget_CCtx) && (cParams->strategy >=3D Z=
STD_btopt)) {
+        DEBUGLOG(4, "reserving optimal parser space");
+        ms->opt.litFreq =3D (unsigned*)ZSTD_cwksp_reserve_aligned64(ws, (1=
<<Litbits) * sizeof(unsigned));
+        ms->opt.litLengthFreq =3D (unsigned*)ZSTD_cwksp_reserve_aligned64(=
ws, (MaxLL+1) * sizeof(unsigned));
+        ms->opt.matchLengthFreq =3D (unsigned*)ZSTD_cwksp_reserve_aligned6=
4(ws, (MaxML+1) * sizeof(unsigned));
+        ms->opt.offCodeFreq =3D (unsigned*)ZSTD_cwksp_reserve_aligned64(ws=
, (MaxOff+1) * sizeof(unsigned));
+        ms->opt.matchTable =3D (ZSTD_match_t*)ZSTD_cwksp_reserve_aligned64=
(ws, ZSTD_OPT_SIZE * sizeof(ZSTD_match_t));
+        ms->opt.priceTable =3D (ZSTD_optimal_t*)ZSTD_cwksp_reserve_aligned=
64(ws, ZSTD_OPT_SIZE * sizeof(ZSTD_optimal_t));
+    }
+
     ms->cParams =3D *cParams;
=20
     RETURN_ERROR_IF(ZSTD_cwksp_reserve_failed(ws), memory_allocation,
@@ -1754,7 +2032,7 @@ static size_t ZSTD_resetCCtx_internal(ZSTD_CCtx* zc,
 {
     ZSTD_cwksp* const ws =3D &zc->workspace;
     DEBUGLOG(4, "ZSTD_resetCCtx_internal: pledgedSrcSize=3D%u, wlog=3D%u, =
useRowMatchFinder=3D%d useBlockSplitter=3D%d",
-                (U32)pledgedSrcSize, params->cParams.windowLog, (int)param=
s->useRowMatchFinder, (int)params->useBlockSplitter);
+                (U32)pledgedSrcSize, params->cParams.windowLog, (int)param=
s->useRowMatchFinder, (int)params->postBlockSplitter);
     assert(!ZSTD_isError(ZSTD_checkCParams(params->cParams)));
=20
     zc->isFirstBlock =3D 1;
@@ -1766,8 +2044,9 @@ static size_t ZSTD_resetCCtx_internal(ZSTD_CCtx* zc,
     params =3D &zc->appliedParams;
=20
     assert(params->useRowMatchFinder !=3D ZSTD_ps_auto);
-    assert(params->useBlockSplitter !=3D ZSTD_ps_auto);
+    assert(params->postBlockSplitter !=3D ZSTD_ps_auto);
     assert(params->ldmParams.enableLdm !=3D ZSTD_ps_auto);
+    assert(params->maxBlockSize !=3D 0);
     if (params->ldmParams.enableLdm =3D=3D ZSTD_ps_enable) {
         /* Adjust long distance matching parameters */
         ZSTD_ldm_adjustParameters(&zc->appliedParams.ldmParams, &params->c=
Params);
@@ -1776,9 +2055,8 @@ static size_t ZSTD_resetCCtx_internal(ZSTD_CCtx* zc,
     }
=20
     {   size_t const windowSize =3D MAX(1, (size_t)MIN(((U64)1 << params->=
cParams.windowLog), pledgedSrcSize));
-        size_t const blockSize =3D MIN(ZSTD_BLOCKSIZE_MAX, windowSize);
-        U32    const divider =3D (params->cParams.minMatch=3D=3D3) ? 3 : 4;
-        size_t const maxNbSeq =3D blockSize / divider;
+        size_t const blockSize =3D MIN(params->maxBlockSize, windowSize);
+        size_t const maxNbSeq =3D ZSTD_maxNbSeq(blockSize, params->cParams=
.minMatch, ZSTD_hasExtSeqProd(params));
         size_t const buffOutSize =3D (zbuff =3D=3D ZSTDb_buffered && param=
s->outBufferMode =3D=3D ZSTD_bm_buffered)
                 ? ZSTD_compressBound(blockSize) + 1
                 : 0;
@@ -1795,8 +2073,7 @@ static size_t ZSTD_resetCCtx_internal(ZSTD_CCtx* zc,
         size_t const neededSpace =3D
             ZSTD_estimateCCtxSize_usingCCtxParams_internal(
                 &params->cParams, &params->ldmParams, zc->staticSize !=3D =
0, params->useRowMatchFinder,
-                buffInSize, buffOutSize, pledgedSrcSize);
-        int resizeWorkspace;
+                buffInSize, buffOutSize, pledgedSrcSize, ZSTD_hasExtSeqPro=
d(params), params->maxBlockSize);
=20
         FORWARD_IF_ERROR(neededSpace, "cctx size estimate failed!");
=20
@@ -1805,7 +2082,7 @@ static size_t ZSTD_resetCCtx_internal(ZSTD_CCtx* zc,
         {   /* Check if workspace is large enough, alloc a new one if need=
ed */
             int const workspaceTooSmall =3D ZSTD_cwksp_sizeof(ws) < needed=
Space;
             int const workspaceWasteful =3D ZSTD_cwksp_check_wasteful(ws, =
neededSpace);
-            resizeWorkspace =3D workspaceTooSmall || workspaceWasteful;
+            int resizeWorkspace =3D workspaceTooSmall || workspaceWasteful;
             DEBUGLOG(4, "Need %zu B workspace", neededSpace);
             DEBUGLOG(4, "windowSize: %zu - blockSize: %zu", windowSize, bl=
ockSize);
=20
@@ -1823,21 +2100,23 @@ static size_t ZSTD_resetCCtx_internal(ZSTD_CCtx* zc,
=20
                 DEBUGLOG(5, "reserving object space");
                 /* Statically sized space.
-                 * entropyWorkspace never moves,
+                 * tmpWorkspace never moves,
                  * though prev/next block swap places */
                 assert(ZSTD_cwksp_check_available(ws, 2 * sizeof(ZSTD_comp=
ressedBlockState_t)));
                 zc->blockState.prevCBlock =3D (ZSTD_compressedBlockState_t=
*) ZSTD_cwksp_reserve_object(ws, sizeof(ZSTD_compressedBlockState_t));
                 RETURN_ERROR_IF(zc->blockState.prevCBlock =3D=3D NULL, mem=
ory_allocation, "couldn't allocate prevCBlock");
                 zc->blockState.nextCBlock =3D (ZSTD_compressedBlockState_t=
*) ZSTD_cwksp_reserve_object(ws, sizeof(ZSTD_compressedBlockState_t));
                 RETURN_ERROR_IF(zc->blockState.nextCBlock =3D=3D NULL, mem=
ory_allocation, "couldn't allocate nextCBlock");
-                zc->entropyWorkspace =3D (U32*) ZSTD_cwksp_reserve_object(=
ws, ENTROPY_WORKSPACE_SIZE);
-                RETURN_ERROR_IF(zc->entropyWorkspace =3D=3D NULL, memory_a=
llocation, "couldn't allocate entropyWorkspace");
+                zc->tmpWorkspace =3D ZSTD_cwksp_reserve_object(ws, TMP_WOR=
KSPACE_SIZE);
+                RETURN_ERROR_IF(zc->tmpWorkspace =3D=3D NULL, memory_alloc=
ation, "couldn't allocate tmpWorkspace");
+                zc->tmpWkspSize =3D TMP_WORKSPACE_SIZE;
         }   }
=20
         ZSTD_cwksp_clear(ws);
=20
         /* init params */
         zc->blockState.matchState.cParams =3D params->cParams;
+        zc->blockState.matchState.prefetchCDictTables =3D params->prefetch=
CDictTables =3D=3D ZSTD_ps_enable;
         zc->pledgedSrcSizePlusOne =3D pledgedSrcSize+1;
         zc->consumedSrcSize =3D 0;
         zc->producedCSize =3D 0;
@@ -1845,7 +2124,7 @@ static size_t ZSTD_resetCCtx_internal(ZSTD_CCtx* zc,
             zc->appliedParams.fParams.contentSizeFlag =3D 0;
         DEBUGLOG(4, "pledged content size : %u ; flag : %u",
             (unsigned)pledgedSrcSize, zc->appliedParams.fParams.contentSiz=
eFlag);
-        zc->blockSize =3D blockSize;
+        zc->blockSizeMax =3D blockSize;
=20
         xxh64_reset(&zc->xxhState, 0);
         zc->stage =3D ZSTDcs_init;
@@ -1854,13 +2133,46 @@ static size_t ZSTD_resetCCtx_internal(ZSTD_CCtx* zc,
=20
         ZSTD_reset_compressedBlockState(zc->blockState.prevCBlock);
=20
+        FORWARD_IF_ERROR(ZSTD_reset_matchState(
+                &zc->blockState.matchState,
+                ws,
+                &params->cParams,
+                params->useRowMatchFinder,
+                crp,
+                needsIndexReset,
+                ZSTD_resetTarget_CCtx), "");
+
+        zc->seqStore.sequencesStart =3D (SeqDef*)ZSTD_cwksp_reserve_aligne=
d64(ws, maxNbSeq * sizeof(SeqDef));
+
+        /* ldm hash table */
+        if (params->ldmParams.enableLdm =3D=3D ZSTD_ps_enable) {
+            /* TODO: avoid memset? */
+            size_t const ldmHSize =3D ((size_t)1) << params->ldmParams.has=
hLog;
+            zc->ldmState.hashTable =3D (ldmEntry_t*)ZSTD_cwksp_reserve_ali=
gned64(ws, ldmHSize * sizeof(ldmEntry_t));
+            ZSTD_memset(zc->ldmState.hashTable, 0, ldmHSize * sizeof(ldmEn=
try_t));
+            zc->ldmSequences =3D (rawSeq*)ZSTD_cwksp_reserve_aligned64(ws,=
 maxNbLdmSeq * sizeof(rawSeq));
+            zc->maxNbLdmSequences =3D maxNbLdmSeq;
+
+            ZSTD_window_init(&zc->ldmState.window);
+            zc->ldmState.loadedDictEnd =3D 0;
+        }
+
+        /* reserve space for block-level external sequences */
+        if (ZSTD_hasExtSeqProd(params)) {
+            size_t const maxNbExternalSeq =3D ZSTD_sequenceBound(blockSize=
);
+            zc->extSeqBufCapacity =3D maxNbExternalSeq;
+            zc->extSeqBuf =3D
+                (ZSTD_Sequence*)ZSTD_cwksp_reserve_aligned64(ws, maxNbExte=
rnalSeq * sizeof(ZSTD_Sequence));
+        }
+
+        /* buffers */
+
         /* ZSTD_wildcopy() is used to copy into the literals buffer,
          * so we have to oversize the buffer by WILDCOPY_OVERLENGTH bytes.
          */
         zc->seqStore.litStart =3D ZSTD_cwksp_reserve_buffer(ws, blockSize =
+ WILDCOPY_OVERLENGTH);
         zc->seqStore.maxNbLit =3D blockSize;
=20
-        /* buffers */
         zc->bufferedPolicy =3D zbuff;
         zc->inBuffSize =3D buffInSize;
         zc->inBuff =3D (char*)ZSTD_cwksp_reserve_buffer(ws, buffInSize);
@@ -1883,32 +2195,9 @@ static size_t ZSTD_resetCCtx_internal(ZSTD_CCtx* zc,
         zc->seqStore.llCode =3D ZSTD_cwksp_reserve_buffer(ws, maxNbSeq * s=
izeof(BYTE));
         zc->seqStore.mlCode =3D ZSTD_cwksp_reserve_buffer(ws, maxNbSeq * s=
izeof(BYTE));
         zc->seqStore.ofCode =3D ZSTD_cwksp_reserve_buffer(ws, maxNbSeq * s=
izeof(BYTE));
-        zc->seqStore.sequencesStart =3D (seqDef*)ZSTD_cwksp_reserve_aligne=
d(ws, maxNbSeq * sizeof(seqDef));
-
-        FORWARD_IF_ERROR(ZSTD_reset_matchState(
-            &zc->blockState.matchState,
-            ws,
-            &params->cParams,
-            params->useRowMatchFinder,
-            crp,
-            needsIndexReset,
-            ZSTD_resetTarget_CCtx), "");
-
-        /* ldm hash table */
-        if (params->ldmParams.enableLdm =3D=3D ZSTD_ps_enable) {
-            /* TODO: avoid memset? */
-            size_t const ldmHSize =3D ((size_t)1) << params->ldmParams.has=
hLog;
-            zc->ldmState.hashTable =3D (ldmEntry_t*)ZSTD_cwksp_reserve_ali=
gned(ws, ldmHSize * sizeof(ldmEntry_t));
-            ZSTD_memset(zc->ldmState.hashTable, 0, ldmHSize * sizeof(ldmEn=
try_t));
-            zc->ldmSequences =3D (rawSeq*)ZSTD_cwksp_reserve_aligned(ws, m=
axNbLdmSeq * sizeof(rawSeq));
-            zc->maxNbLdmSequences =3D maxNbLdmSeq;
-
-            ZSTD_window_init(&zc->ldmState.window);
-            zc->ldmState.loadedDictEnd =3D 0;
-        }
=20
         DEBUGLOG(3, "wksp: finished allocating, %zd bytes remain available=
", ZSTD_cwksp_available_space(ws));
-        assert(ZSTD_cwksp_estimated_space_within_bounds(ws, neededSpace, r=
esizeWorkspace));
+        assert(ZSTD_cwksp_estimated_space_within_bounds(ws, neededSpace));
=20
         zc->initialized =3D 1;
=20
@@ -1980,7 +2269,8 @@ ZSTD_resetCCtx_byAttachingCDict(ZSTD_CCtx* cctx,
         }
=20
         params.cParams =3D ZSTD_adjustCParams_internal(adjusted_cdict_cPar=
ams, pledgedSrcSize,
-                                                     cdict->dictContentSiz=
e, ZSTD_cpm_attachDict);
+                                                     cdict->dictContentSiz=
e, ZSTD_cpm_attachDict,
+                                                     params.useRowMatchFin=
der);
         params.cParams.windowLog =3D windowLog;
         params.useRowMatchFinder =3D cdict->useRowMatchFinder;    /* cdict=
 overrides */
         FORWARD_IF_ERROR(ZSTD_resetCCtx_internal(cctx, &params, pledgedSrc=
Size,
@@ -2019,6 +2309,22 @@ ZSTD_resetCCtx_byAttachingCDict(ZSTD_CCtx* cctx,
     return 0;
 }
=20
+static void ZSTD_copyCDictTableIntoCCtx(U32* dst, U32 const* src, size_t t=
ableSize,
+                                        ZSTD_compressionParameters const* =
cParams) {
+    if (ZSTD_CDictIndicesAreTagged(cParams)){
+        /* Remove tags from the CDict table if they are present.
+         * See docs on "short cache" in zstd_compress_internal.h for conte=
xt. */
+        size_t i;
+        for (i =3D 0; i < tableSize; i++) {
+            U32 const taggedIndex =3D src[i];
+            U32 const index =3D taggedIndex >> ZSTD_SHORT_CACHE_TAG_BITS;
+            dst[i] =3D index;
+        }
+    } else {
+        ZSTD_memcpy(dst, src, tableSize * sizeof(U32));
+    }
+}
+
 static size_t ZSTD_resetCCtx_byCopyingCDict(ZSTD_CCtx* cctx,
                             const ZSTD_CDict* cdict,
                             ZSTD_CCtx_params params,
@@ -2054,26 +2360,29 @@ static size_t ZSTD_resetCCtx_byCopyingCDict(ZSTD_CC=
tx* cctx,
                                                             : 0;
         size_t const hSize =3D  (size_t)1 << cdict_cParams->hashLog;
=20
-        ZSTD_memcpy(cctx->blockState.matchState.hashTable,
-               cdict->matchState.hashTable,
-               hSize * sizeof(U32));
+        ZSTD_copyCDictTableIntoCCtx(cctx->blockState.matchState.hashTable,
+                                cdict->matchState.hashTable,
+                                hSize, cdict_cParams);
+
         /* Do not copy cdict's chainTable if cctx has parameters such that=
 it would not use chainTable */
         if (ZSTD_allocateChainTable(cctx->appliedParams.cParams.strategy, =
cctx->appliedParams.useRowMatchFinder, 0 /* forDDSDict */)) {
-            ZSTD_memcpy(cctx->blockState.matchState.chainTable,
-               cdict->matchState.chainTable,
-               chainSize * sizeof(U32));
+            ZSTD_copyCDictTableIntoCCtx(cctx->blockState.matchState.chainT=
able,
+                                    cdict->matchState.chainTable,
+                                    chainSize, cdict_cParams);
         }
         /* copy tag table */
         if (ZSTD_rowMatchFinderUsed(cdict_cParams->strategy, cdict->useRow=
MatchFinder)) {
-            size_t const tagTableSize =3D hSize*sizeof(U16);
+            size_t const tagTableSize =3D hSize;
             ZSTD_memcpy(cctx->blockState.matchState.tagTable,
-                cdict->matchState.tagTable,
-                tagTableSize);
+                        cdict->matchState.tagTable,
+                        tagTableSize);
+            cctx->blockState.matchState.hashSalt =3D cdict->matchState.has=
hSalt;
         }
     }
=20
     /* Zero the hashTable3, since the cdict never fills it */
-    {   int const h3log =3D cctx->blockState.matchState.hashLog3;
+    assert(cctx->blockState.matchState.hashLog3 <=3D 31);
+    {   U32 const h3log =3D cctx->blockState.matchState.hashLog3;
         size_t const h3Size =3D h3log ? ((size_t)1 << h3log) : 0;
         assert(cdict->matchState.hashLog3 =3D=3D 0);
         ZSTD_memset(cctx->blockState.matchState.hashTable3, 0, h3Size * si=
zeof(U32));
@@ -2082,8 +2391,8 @@ static size_t ZSTD_resetCCtx_byCopyingCDict(ZSTD_CCtx=
* cctx,
     ZSTD_cwksp_mark_tables_clean(&cctx->workspace);
=20
     /* copy dictionary offsets */
-    {   ZSTD_matchState_t const* srcMatchState =3D &cdict->matchState;
-        ZSTD_matchState_t* dstMatchState =3D &cctx->blockState.matchState;
+    {   ZSTD_MatchState_t const* srcMatchState =3D &cdict->matchState;
+        ZSTD_MatchState_t* dstMatchState =3D &cctx->blockState.matchState;
         dstMatchState->window       =3D srcMatchState->window;
         dstMatchState->nextToUpdate =3D srcMatchState->nextToUpdate;
         dstMatchState->loadedDictEnd=3D srcMatchState->loadedDictEnd;
@@ -2141,12 +2450,13 @@ static size_t ZSTD_copyCCtx_internal(ZSTD_CCtx* dst=
CCtx,
         /* Copy only compression parameters related to tables. */
         params.cParams =3D srcCCtx->appliedParams.cParams;
         assert(srcCCtx->appliedParams.useRowMatchFinder !=3D ZSTD_ps_auto);
-        assert(srcCCtx->appliedParams.useBlockSplitter !=3D ZSTD_ps_auto);
+        assert(srcCCtx->appliedParams.postBlockSplitter !=3D ZSTD_ps_auto);
         assert(srcCCtx->appliedParams.ldmParams.enableLdm !=3D ZSTD_ps_aut=
o);
         params.useRowMatchFinder =3D srcCCtx->appliedParams.useRowMatchFin=
der;
-        params.useBlockSplitter =3D srcCCtx->appliedParams.useBlockSplitte=
r;
+        params.postBlockSplitter =3D srcCCtx->appliedParams.postBlockSplit=
ter;
         params.ldmParams =3D srcCCtx->appliedParams.ldmParams;
         params.fParams =3D fParams;
+        params.maxBlockSize =3D srcCCtx->appliedParams.maxBlockSize;
         ZSTD_resetCCtx_internal(dstCCtx, &params, pledgedSrcSize,
                                 /* loadedDictSize */ 0,
                                 ZSTDcrp_leaveDirty, zbuff);
@@ -2166,7 +2476,7 @@ static size_t ZSTD_copyCCtx_internal(ZSTD_CCtx* dstCC=
tx,
                                     ? ((size_t)1 << srcCCtx->appliedParams=
.cParams.chainLog)
                                     : 0;
         size_t const hSize =3D  (size_t)1 << srcCCtx->appliedParams.cParam=
s.hashLog;
-        int const h3log =3D srcCCtx->blockState.matchState.hashLog3;
+        U32 const h3log =3D srcCCtx->blockState.matchState.hashLog3;
         size_t const h3Size =3D h3log ? ((size_t)1 << h3log) : 0;
=20
         ZSTD_memcpy(dstCCtx->blockState.matchState.hashTable,
@@ -2184,8 +2494,8 @@ static size_t ZSTD_copyCCtx_internal(ZSTD_CCtx* dstCC=
tx,
=20
     /* copy dictionary offsets */
     {
-        const ZSTD_matchState_t* srcMatchState =3D &srcCCtx->blockState.ma=
tchState;
-        ZSTD_matchState_t* dstMatchState =3D &dstCCtx->blockState.matchSta=
te;
+        const ZSTD_MatchState_t* srcMatchState =3D &srcCCtx->blockState.ma=
tchState;
+        ZSTD_MatchState_t* dstMatchState =3D &dstCCtx->blockState.matchSta=
te;
         dstMatchState->window       =3D srcMatchState->window;
         dstMatchState->nextToUpdate =3D srcMatchState->nextToUpdate;
         dstMatchState->loadedDictEnd=3D srcMatchState->loadedDictEnd;
@@ -2234,7 +2544,7 @@ ZSTD_reduceTable_internal (U32* const table, U32 cons=
t size, U32 const reducerVa
     /* Protect special index values < ZSTD_WINDOW_START_INDEX. */
     U32 const reducerThreshold =3D reducerValue + ZSTD_WINDOW_START_INDEX;
     assert((size & (ZSTD_ROWSIZE-1)) =3D=3D 0);  /* multiple of ZSTD_ROWSI=
ZE */
-    assert(size < (1U<<31));   /* can be casted to int */
+    assert(size < (1U<<31));   /* can be cast to int */
=20
=20
     for (rowNb=3D0 ; rowNb < nbRows ; rowNb++) {
@@ -2267,7 +2577,7 @@ static void ZSTD_reduceTable_btlazy2(U32* const table=
, U32 const size, U32 const
=20
 /*! ZSTD_reduceIndex() :
 *   rescale all indexes to avoid future overflow (indexes are U32) */
-static void ZSTD_reduceIndex (ZSTD_matchState_t* ms, ZSTD_CCtx_params cons=
t* params, const U32 reducerValue)
+static void ZSTD_reduceIndex (ZSTD_MatchState_t* ms, ZSTD_CCtx_params cons=
t* params, const U32 reducerValue)
 {
     {   U32 const hSize =3D (U32)1 << params->cParams.hashLog;
         ZSTD_reduceTable(ms->hashTable, hSize, reducerValue);
@@ -2294,26 +2604,32 @@ static void ZSTD_reduceIndex (ZSTD_matchState_t* ms=
, ZSTD_CCtx_params const* par
=20
 /* See doc/zstd_compression_format.md for detailed format description */
=20
-void ZSTD_seqToCodes(const seqStore_t* seqStorePtr)
+int ZSTD_seqToCodes(const SeqStore_t* seqStorePtr)
 {
-    const seqDef* const sequences =3D seqStorePtr->sequencesStart;
+    const SeqDef* const sequences =3D seqStorePtr->sequencesStart;
     BYTE* const llCodeTable =3D seqStorePtr->llCode;
     BYTE* const ofCodeTable =3D seqStorePtr->ofCode;
     BYTE* const mlCodeTable =3D seqStorePtr->mlCode;
     U32 const nbSeq =3D (U32)(seqStorePtr->sequences - seqStorePtr->sequen=
cesStart);
     U32 u;
+    int longOffsets =3D 0;
     assert(nbSeq <=3D seqStorePtr->maxNbSeq);
     for (u=3D0; u<nbSeq; u++) {
         U32 const llv =3D sequences[u].litLength;
+        U32 const ofCode =3D ZSTD_highbit32(sequences[u].offBase);
         U32 const mlv =3D sequences[u].mlBase;
         llCodeTable[u] =3D (BYTE)ZSTD_LLcode(llv);
-        ofCodeTable[u] =3D (BYTE)ZSTD_highbit32(sequences[u].offBase);
+        ofCodeTable[u] =3D (BYTE)ofCode;
         mlCodeTable[u] =3D (BYTE)ZSTD_MLcode(mlv);
+        assert(!(MEM_64bits() && ofCode >=3D STREAM_ACCUMULATOR_MIN));
+        if (MEM_32bits() && ofCode >=3D STREAM_ACCUMULATOR_MIN)
+            longOffsets =3D 1;
     }
     if (seqStorePtr->longLengthType=3D=3DZSTD_llt_literalLength)
         llCodeTable[seqStorePtr->longLengthPos] =3D MaxLL;
     if (seqStorePtr->longLengthType=3D=3DZSTD_llt_matchLength)
         mlCodeTable[seqStorePtr->longLengthPos] =3D MaxML;
+    return longOffsets;
 }
=20
 /* ZSTD_useTargetCBlockSize():
@@ -2333,9 +2649,9 @@ static int ZSTD_useTargetCBlockSize(const ZSTD_CCtx_p=
arams* cctxParams)
  * Returns 1 if true, 0 otherwise. */
 static int ZSTD_blockSplitterEnabled(ZSTD_CCtx_params* cctxParams)
 {
-    DEBUGLOG(5, "ZSTD_blockSplitterEnabled (useBlockSplitter=3D%d)", cctxP=
arams->useBlockSplitter);
-    assert(cctxParams->useBlockSplitter !=3D ZSTD_ps_auto);
-    return (cctxParams->useBlockSplitter =3D=3D ZSTD_ps_enable);
+    DEBUGLOG(5, "ZSTD_blockSplitterEnabled (postBlockSplitter=3D%d)", cctx=
Params->postBlockSplitter);
+    assert(cctxParams->postBlockSplitter !=3D ZSTD_ps_auto);
+    return (cctxParams->postBlockSplitter =3D=3D ZSTD_ps_enable);
 }
=20
 /* Type returned by ZSTD_buildSequencesStatistics containing finalized sym=
bol encoding types
@@ -2347,6 +2663,7 @@ typedef struct {
     U32 MLtype;
     size_t size;
     size_t lastCountSize; /* Accounts for bug in 1.3.4. More detail in ZST=
D_entropyCompressSeqStore_internal() */
+    int longOffsets;
 } ZSTD_symbolEncodingTypeStats_t;
=20
 /* ZSTD_buildSequencesStatistics():
@@ -2357,11 +2674,13 @@ typedef struct {
  * entropyWkspSize must be of size at least ENTROPY_WORKSPACE_SIZE - (MaxS=
eq + 1)*sizeof(U32)
  */
 static ZSTD_symbolEncodingTypeStats_t
-ZSTD_buildSequencesStatistics(seqStore_t* seqStorePtr, size_t nbSeq,
-                        const ZSTD_fseCTables_t* prevEntropy, ZSTD_fseCTab=
les_t* nextEntropy,
-                              BYTE* dst, const BYTE* const dstEnd,
-                              ZSTD_strategy strategy, unsigned* countWorks=
pace,
-                              void* entropyWorkspace, size_t entropyWkspSi=
ze) {
+ZSTD_buildSequencesStatistics(
+                const SeqStore_t* seqStorePtr, size_t nbSeq,
+                const ZSTD_fseCTables_t* prevEntropy, ZSTD_fseCTables_t* n=
extEntropy,
+                      BYTE* dst, const BYTE* const dstEnd,
+                      ZSTD_strategy strategy, unsigned* countWorkspace,
+                      void* entropyWorkspace, size_t entropyWkspSize)
+{
     BYTE* const ostart =3D dst;
     const BYTE* const oend =3D dstEnd;
     BYTE* op =3D ostart;
@@ -2375,7 +2694,7 @@ ZSTD_buildSequencesStatistics(seqStore_t* seqStorePtr=
, size_t nbSeq,
=20
     stats.lastCountSize =3D 0;
     /* convert length/distances into codes */
-    ZSTD_seqToCodes(seqStorePtr);
+    stats.longOffsets =3D ZSTD_seqToCodes(seqStorePtr);
     assert(op <=3D oend);
     assert(nbSeq !=3D 0); /* ZSTD_selectEncodingType() divides by nbSeq */
     /* build CTable for Literal Lengths */
@@ -2392,7 +2711,7 @@ ZSTD_buildSequencesStatistics(seqStore_t* seqStorePtr=
, size_t nbSeq,
         assert(!(stats.LLtype < set_compressed && nextEntropy->litlength_r=
epeatMode !=3D FSE_repeat_none)); /* We don't copy tables */
         {   size_t const countSize =3D ZSTD_buildCTable(
                 op, (size_t)(oend - op),
-                CTable_LitLength, LLFSELog, (symbolEncodingType_e)stats.LL=
type,
+                CTable_LitLength, LLFSELog, (SymbolEncodingType_e)stats.LL=
type,
                 countWorkspace, max, llCodeTable, nbSeq,
                 LL_defaultNorm, LL_defaultNormLog, MaxLL,
                 prevEntropy->litlengthCTable,
@@ -2413,7 +2732,7 @@ ZSTD_buildSequencesStatistics(seqStore_t* seqStorePtr=
, size_t nbSeq,
         size_t const mostFrequent =3D HIST_countFast_wksp(
             countWorkspace, &max, ofCodeTable, nbSeq, entropyWorkspace, en=
tropyWkspSize);  /* can't fail */
         /* We can only use the basic table if max <=3D DefaultMaxOff, othe=
rwise the offsets are too large */
-        ZSTD_defaultPolicy_e const defaultPolicy =3D (max <=3D DefaultMaxO=
ff) ? ZSTD_defaultAllowed : ZSTD_defaultDisallowed;
+        ZSTD_DefaultPolicy_e const defaultPolicy =3D (max <=3D DefaultMaxO=
ff) ? ZSTD_defaultAllowed : ZSTD_defaultDisallowed;
         DEBUGLOG(5, "Building OF table");
         nextEntropy->offcode_repeatMode =3D prevEntropy->offcode_repeatMod=
e;
         stats.Offtype =3D ZSTD_selectEncodingType(&nextEntropy->offcode_re=
peatMode,
@@ -2424,7 +2743,7 @@ ZSTD_buildSequencesStatistics(seqStore_t* seqStorePtr=
, size_t nbSeq,
         assert(!(stats.Offtype < set_compressed && nextEntropy->offcode_re=
peatMode !=3D FSE_repeat_none)); /* We don't copy tables */
         {   size_t const countSize =3D ZSTD_buildCTable(
                 op, (size_t)(oend - op),
-                CTable_OffsetBits, OffFSELog, (symbolEncodingType_e)stats.=
Offtype,
+                CTable_OffsetBits, OffFSELog, (SymbolEncodingType_e)stats.=
Offtype,
                 countWorkspace, max, ofCodeTable, nbSeq,
                 OF_defaultNorm, OF_defaultNormLog, DefaultMaxOff,
                 prevEntropy->offcodeCTable,
@@ -2454,7 +2773,7 @@ ZSTD_buildSequencesStatistics(seqStore_t* seqStorePtr=
, size_t nbSeq,
         assert(!(stats.MLtype < set_compressed && nextEntropy->matchlength=
_repeatMode !=3D FSE_repeat_none)); /* We don't copy tables */
         {   size_t const countSize =3D ZSTD_buildCTable(
                 op, (size_t)(oend - op),
-                CTable_MatchLength, MLFSELog, (symbolEncodingType_e)stats.=
MLtype,
+                CTable_MatchLength, MLFSELog, (SymbolEncodingType_e)stats.=
MLtype,
                 countWorkspace, max, mlCodeTable, nbSeq,
                 ML_defaultNorm, ML_defaultNormLog, MaxML,
                 prevEntropy->matchlengthCTable,
@@ -2480,22 +2799,23 @@ ZSTD_buildSequencesStatistics(seqStore_t* seqStoreP=
tr, size_t nbSeq,
  */
 #define SUSPECT_UNCOMPRESSIBLE_LITERAL_RATIO 20
 MEM_STATIC size_t
-ZSTD_entropyCompressSeqStore_internal(seqStore_t* seqStorePtr,
-                          const ZSTD_entropyCTables_t* prevEntropy,
-                                ZSTD_entropyCTables_t* nextEntropy,
-                          const ZSTD_CCtx_params* cctxParams,
-                                void* dst, size_t dstCapacity,
-                                void* entropyWorkspace, size_t entropyWksp=
Size,
-                          const int bmi2)
+ZSTD_entropyCompressSeqStore_internal(
+                              void* dst, size_t dstCapacity,
+                        const void* literals, size_t litSize,
+                        const SeqStore_t* seqStorePtr,
+                        const ZSTD_entropyCTables_t* prevEntropy,
+                              ZSTD_entropyCTables_t* nextEntropy,
+                        const ZSTD_CCtx_params* cctxParams,
+                              void* entropyWorkspace, size_t entropyWkspSi=
ze,
+                        const int bmi2)
 {
-    const int longOffsets =3D cctxParams->cParams.windowLog > STREAM_ACCUM=
ULATOR_MIN;
     ZSTD_strategy const strategy =3D cctxParams->cParams.strategy;
     unsigned* count =3D (unsigned*)entropyWorkspace;
     FSE_CTable* CTable_LitLength =3D nextEntropy->fse.litlengthCTable;
     FSE_CTable* CTable_OffsetBits =3D nextEntropy->fse.offcodeCTable;
     FSE_CTable* CTable_MatchLength =3D nextEntropy->fse.matchlengthCTable;
-    const seqDef* const sequences =3D seqStorePtr->sequencesStart;
-    const size_t nbSeq =3D seqStorePtr->sequences - seqStorePtr->sequences=
Start;
+    const SeqDef* const sequences =3D seqStorePtr->sequencesStart;
+    const size_t nbSeq =3D (size_t)(seqStorePtr->sequences - seqStorePtr->=
sequencesStart);
     const BYTE* const ofCodeTable =3D seqStorePtr->ofCode;
     const BYTE* const llCodeTable =3D seqStorePtr->llCode;
     const BYTE* const mlCodeTable =3D seqStorePtr->mlCode;
@@ -2503,29 +2823,28 @@ ZSTD_entropyCompressSeqStore_internal(seqStore_t* s=
eqStorePtr,
     BYTE* const oend =3D ostart + dstCapacity;
     BYTE* op =3D ostart;
     size_t lastCountSize;
+    int longOffsets =3D 0;
=20
     entropyWorkspace =3D count + (MaxSeq + 1);
     entropyWkspSize -=3D (MaxSeq + 1) * sizeof(*count);
=20
-    DEBUGLOG(4, "ZSTD_entropyCompressSeqStore_internal (nbSeq=3D%zu)", nbS=
eq);
+    DEBUGLOG(5, "ZSTD_entropyCompressSeqStore_internal (nbSeq=3D%zu, dstCa=
pacity=3D%zu)", nbSeq, dstCapacity);
     ZSTD_STATIC_ASSERT(HUF_WORKSPACE_SIZE >=3D (1<<MAX(MLFSELog,LLFSELog))=
);
     assert(entropyWkspSize >=3D HUF_WORKSPACE_SIZE);
=20
     /* Compress literals */
-    {   const BYTE* const literals =3D seqStorePtr->litStart;
-        size_t const numSequences =3D seqStorePtr->sequences - seqStorePtr=
->sequencesStart;
-        size_t const numLiterals =3D seqStorePtr->lit - seqStorePtr->litSt=
art;
+    {   size_t const numSequences =3D (size_t)(seqStorePtr->sequences - se=
qStorePtr->sequencesStart);
         /* Base suspicion of uncompressibility on ratio of literals to seq=
uences */
-        unsigned const suspectUncompressible =3D (numSequences =3D=3D 0) |=
| (numLiterals / numSequences >=3D SUSPECT_UNCOMPRESSIBLE_LITERAL_RATIO);
-        size_t const litSize =3D (size_t)(seqStorePtr->lit - literals);
+        int const suspectUncompressible =3D (numSequences =3D=3D 0) || (li=
tSize / numSequences >=3D SUSPECT_UNCOMPRESSIBLE_LITERAL_RATIO);
+
         size_t const cSize =3D ZSTD_compressLiterals(
-                                    &prevEntropy->huf, &nextEntropy->huf,
-                                    cctxParams->cParams.strategy,
-                                    ZSTD_literalsCompressionIsDisabled(cct=
xParams),
                                     op, dstCapacity,
                                     literals, litSize,
                                     entropyWorkspace, entropyWkspSize,
-                                    bmi2, suspectUncompressible);
+                                    &prevEntropy->huf, &nextEntropy->huf,
+                                    cctxParams->cParams.strategy,
+                                    ZSTD_literalsCompressionIsDisabled(cct=
xParams),
+                                    suspectUncompressible, bmi2);
         FORWARD_IF_ERROR(cSize, "ZSTD_compressLiterals failed");
         assert(cSize <=3D dstCapacity);
         op +=3D cSize;
@@ -2551,11 +2870,10 @@ ZSTD_entropyCompressSeqStore_internal(seqStore_t* s=
eqStorePtr,
         ZSTD_memcpy(&nextEntropy->fse, &prevEntropy->fse, sizeof(prevEntro=
py->fse));
         return (size_t)(op - ostart);
     }
-    {
-        ZSTD_symbolEncodingTypeStats_t stats;
-        BYTE* seqHead =3D op++;
+    {   BYTE* const seqHead =3D op++;
         /* build stats for sequences */
-        stats =3D ZSTD_buildSequencesStatistics(seqStorePtr, nbSeq,
+        const ZSTD_symbolEncodingTypeStats_t stats =3D
+                ZSTD_buildSequencesStatistics(seqStorePtr, nbSeq,
                                              &prevEntropy->fse, &nextEntro=
py->fse,
                                               op, oend,
                                               strategy, count,
@@ -2564,6 +2882,7 @@ ZSTD_entropyCompressSeqStore_internal(seqStore_t* seq=
StorePtr,
         *seqHead =3D (BYTE)((stats.LLtype<<6) + (stats.Offtype<<4) + (stat=
s.MLtype<<2));
         lastCountSize =3D stats.lastCountSize;
         op +=3D stats.size;
+        longOffsets =3D stats.longOffsets;
     }
=20
     {   size_t const bitstreamSize =3D ZSTD_encodeSequences(
@@ -2597,104 +2916,146 @@ ZSTD_entropyCompressSeqStore_internal(seqStore_t*=
 seqStorePtr,
     return (size_t)(op - ostart);
 }
=20
-MEM_STATIC size_t
-ZSTD_entropyCompressSeqStore(seqStore_t* seqStorePtr,
-                       const ZSTD_entropyCTables_t* prevEntropy,
-                             ZSTD_entropyCTables_t* nextEntropy,
-                       const ZSTD_CCtx_params* cctxParams,
-                             void* dst, size_t dstCapacity,
-                             size_t srcSize,
-                             void* entropyWorkspace, size_t entropyWkspSiz=
e,
-                             int bmi2)
+static size_t
+ZSTD_entropyCompressSeqStore_wExtLitBuffer(
+                          void* dst, size_t dstCapacity,
+                    const void* literals, size_t litSize,
+                          size_t blockSize,
+                    const SeqStore_t* seqStorePtr,
+                    const ZSTD_entropyCTables_t* prevEntropy,
+                          ZSTD_entropyCTables_t* nextEntropy,
+                    const ZSTD_CCtx_params* cctxParams,
+                          void* entropyWorkspace, size_t entropyWkspSize,
+                          int bmi2)
 {
     size_t const cSize =3D ZSTD_entropyCompressSeqStore_internal(
-                            seqStorePtr, prevEntropy, nextEntropy, cctxPar=
ams,
                             dst, dstCapacity,
+                            literals, litSize,
+                            seqStorePtr, prevEntropy, nextEntropy, cctxPar=
ams,
                             entropyWorkspace, entropyWkspSize, bmi2);
     if (cSize =3D=3D 0) return 0;
     /* When srcSize <=3D dstCapacity, there is enough space to write a raw=
 uncompressed block.
      * Since we ran out of space, block must be not compressible, so fall =
back to raw uncompressed block.
      */
-    if ((cSize =3D=3D ERROR(dstSize_tooSmall)) & (srcSize <=3D dstCapacity=
))
+    if ((cSize =3D=3D ERROR(dstSize_tooSmall)) & (blockSize <=3D dstCapaci=
ty)) {
+        DEBUGLOG(4, "not enough dstCapacity (%zu) for ZSTD_entropyCompress=
SeqStore_internal()=3D> do not compress block", dstCapacity);
         return 0;  /* block not compressed */
+    }
     FORWARD_IF_ERROR(cSize, "ZSTD_entropyCompressSeqStore_internal failed"=
);
=20
     /* Check compressibility */
-    {   size_t const maxCSize =3D srcSize - ZSTD_minGain(srcSize, cctxPara=
ms->cParams.strategy);
+    {   size_t const maxCSize =3D blockSize - ZSTD_minGain(blockSize, cctx=
Params->cParams.strategy);
         if (cSize >=3D maxCSize) return 0;  /* block not compressed */
     }
-    DEBUGLOG(4, "ZSTD_entropyCompressSeqStore() cSize: %zu", cSize);
+    DEBUGLOG(5, "ZSTD_entropyCompressSeqStore() cSize: %zu", cSize);
+    /* libzstd decoder before  > v1.5.4 is not compatible with compressed =
blocks of size ZSTD_BLOCKSIZE_MAX exactly.
+     * This restriction is indirectly already fulfilled by respecting ZSTD=
_minGain() condition above.
+     */
+    assert(cSize < ZSTD_BLOCKSIZE_MAX);
     return cSize;
 }
=20
+static size_t
+ZSTD_entropyCompressSeqStore(
+                    const SeqStore_t* seqStorePtr,
+                    const ZSTD_entropyCTables_t* prevEntropy,
+                          ZSTD_entropyCTables_t* nextEntropy,
+                    const ZSTD_CCtx_params* cctxParams,
+                          void* dst, size_t dstCapacity,
+                          size_t srcSize,
+                          void* entropyWorkspace, size_t entropyWkspSize,
+                          int bmi2)
+{
+    return ZSTD_entropyCompressSeqStore_wExtLitBuffer(
+                dst, dstCapacity,
+                seqStorePtr->litStart, (size_t)(seqStorePtr->lit - seqStor=
ePtr->litStart),
+                srcSize,
+                seqStorePtr,
+                prevEntropy, nextEntropy,
+                cctxParams,
+                entropyWorkspace, entropyWkspSize,
+                bmi2);
+}
+
 /* ZSTD_selectBlockCompressor() :
  * Not static, but internal use only (used by long distance matcher)
  * assumption : strat is a valid strategy */
-ZSTD_blockCompressor ZSTD_selectBlockCompressor(ZSTD_strategy strat, ZSTD_=
paramSwitch_e useRowMatchFinder, ZSTD_dictMode_e dictMode)
+ZSTD_BlockCompressor_f ZSTD_selectBlockCompressor(ZSTD_strategy strat, ZST=
D_ParamSwitch_e useRowMatchFinder, ZSTD_dictMode_e dictMode)
 {
-    static const ZSTD_blockCompressor blockCompressor[4][ZSTD_STRATEGY_MAX=
+1] =3D {
+    static const ZSTD_BlockCompressor_f blockCompressor[4][ZSTD_STRATEGY_M=
AX+1] =3D {
         { ZSTD_compressBlock_fast  /* default for 0 */,
           ZSTD_compressBlock_fast,
-          ZSTD_compressBlock_doubleFast,
-          ZSTD_compressBlock_greedy,
-          ZSTD_compressBlock_lazy,
-          ZSTD_compressBlock_lazy2,
-          ZSTD_compressBlock_btlazy2,
-          ZSTD_compressBlock_btopt,
-          ZSTD_compressBlock_btultra,
-          ZSTD_compressBlock_btultra2 },
+          ZSTD_COMPRESSBLOCK_DOUBLEFAST,
+          ZSTD_COMPRESSBLOCK_GREEDY,
+          ZSTD_COMPRESSBLOCK_LAZY,
+          ZSTD_COMPRESSBLOCK_LAZY2,
+          ZSTD_COMPRESSBLOCK_BTLAZY2,
+          ZSTD_COMPRESSBLOCK_BTOPT,
+          ZSTD_COMPRESSBLOCK_BTULTRA,
+          ZSTD_COMPRESSBLOCK_BTULTRA2
+        },
         { ZSTD_compressBlock_fast_extDict  /* default for 0 */,
           ZSTD_compressBlock_fast_extDict,
-          ZSTD_compressBlock_doubleFast_extDict,
-          ZSTD_compressBlock_greedy_extDict,
-          ZSTD_compressBlock_lazy_extDict,
-          ZSTD_compressBlock_lazy2_extDict,
-          ZSTD_compressBlock_btlazy2_extDict,
-          ZSTD_compressBlock_btopt_extDict,
-          ZSTD_compressBlock_btultra_extDict,
-          ZSTD_compressBlock_btultra_extDict },
+          ZSTD_COMPRESSBLOCK_DOUBLEFAST_EXTDICT,
+          ZSTD_COMPRESSBLOCK_GREEDY_EXTDICT,
+          ZSTD_COMPRESSBLOCK_LAZY_EXTDICT,
+          ZSTD_COMPRESSBLOCK_LAZY2_EXTDICT,
+          ZSTD_COMPRESSBLOCK_BTLAZY2_EXTDICT,
+          ZSTD_COMPRESSBLOCK_BTOPT_EXTDICT,
+          ZSTD_COMPRESSBLOCK_BTULTRA_EXTDICT,
+          ZSTD_COMPRESSBLOCK_BTULTRA_EXTDICT
+        },
         { ZSTD_compressBlock_fast_dictMatchState  /* default for 0 */,
           ZSTD_compressBlock_fast_dictMatchState,
-          ZSTD_compressBlock_doubleFast_dictMatchState,
-          ZSTD_compressBlock_greedy_dictMatchState,
-          ZSTD_compressBlock_lazy_dictMatchState,
-          ZSTD_compressBlock_lazy2_dictMatchState,
-          ZSTD_compressBlock_btlazy2_dictMatchState,
-          ZSTD_compressBlock_btopt_dictMatchState,
-          ZSTD_compressBlock_btultra_dictMatchState,
-          ZSTD_compressBlock_btultra_dictMatchState },
+          ZSTD_COMPRESSBLOCK_DOUBLEFAST_DICTMATCHSTATE,
+          ZSTD_COMPRESSBLOCK_GREEDY_DICTMATCHSTATE,
+          ZSTD_COMPRESSBLOCK_LAZY_DICTMATCHSTATE,
+          ZSTD_COMPRESSBLOCK_LAZY2_DICTMATCHSTATE,
+          ZSTD_COMPRESSBLOCK_BTLAZY2_DICTMATCHSTATE,
+          ZSTD_COMPRESSBLOCK_BTOPT_DICTMATCHSTATE,
+          ZSTD_COMPRESSBLOCK_BTULTRA_DICTMATCHSTATE,
+          ZSTD_COMPRESSBLOCK_BTULTRA_DICTMATCHSTATE
+        },
         { NULL  /* default for 0 */,
           NULL,
           NULL,
-          ZSTD_compressBlock_greedy_dedicatedDictSearch,
-          ZSTD_compressBlock_lazy_dedicatedDictSearch,
-          ZSTD_compressBlock_lazy2_dedicatedDictSearch,
+          ZSTD_COMPRESSBLOCK_GREEDY_DEDICATEDDICTSEARCH,
+          ZSTD_COMPRESSBLOCK_LAZY_DEDICATEDDICTSEARCH,
+          ZSTD_COMPRESSBLOCK_LAZY2_DEDICATEDDICTSEARCH,
           NULL,
           NULL,
           NULL,
           NULL }
     };
-    ZSTD_blockCompressor selectedCompressor;
+    ZSTD_BlockCompressor_f selectedCompressor;
     ZSTD_STATIC_ASSERT((unsigned)ZSTD_fast =3D=3D 1);
=20
-    assert(ZSTD_cParam_withinBounds(ZSTD_c_strategy, strat));
-    DEBUGLOG(4, "Selected block compressor: dictMode=3D%d strat=3D%d rowMa=
tchfinder=3D%d", (int)dictMode, (int)strat, (int)useRowMatchFinder);
+    assert(ZSTD_cParam_withinBounds(ZSTD_c_strategy, (int)strat));
+    DEBUGLOG(5, "Selected block compressor: dictMode=3D%d strat=3D%d rowMa=
tchfinder=3D%d", (int)dictMode, (int)strat, (int)useRowMatchFinder);
     if (ZSTD_rowMatchFinderUsed(strat, useRowMatchFinder)) {
-        static const ZSTD_blockCompressor rowBasedBlockCompressors[4][3] =
=3D {
-            { ZSTD_compressBlock_greedy_row,
-            ZSTD_compressBlock_lazy_row,
-            ZSTD_compressBlock_lazy2_row },
-            { ZSTD_compressBlock_greedy_extDict_row,
-            ZSTD_compressBlock_lazy_extDict_row,
-            ZSTD_compressBlock_lazy2_extDict_row },
-            { ZSTD_compressBlock_greedy_dictMatchState_row,
-            ZSTD_compressBlock_lazy_dictMatchState_row,
-            ZSTD_compressBlock_lazy2_dictMatchState_row },
-            { ZSTD_compressBlock_greedy_dedicatedDictSearch_row,
-            ZSTD_compressBlock_lazy_dedicatedDictSearch_row,
-            ZSTD_compressBlock_lazy2_dedicatedDictSearch_row }
+        static const ZSTD_BlockCompressor_f rowBasedBlockCompressors[4][3]=
 =3D {
+            {
+                ZSTD_COMPRESSBLOCK_GREEDY_ROW,
+                ZSTD_COMPRESSBLOCK_LAZY_ROW,
+                ZSTD_COMPRESSBLOCK_LAZY2_ROW
+            },
+            {
+                ZSTD_COMPRESSBLOCK_GREEDY_EXTDICT_ROW,
+                ZSTD_COMPRESSBLOCK_LAZY_EXTDICT_ROW,
+                ZSTD_COMPRESSBLOCK_LAZY2_EXTDICT_ROW
+            },
+            {
+                ZSTD_COMPRESSBLOCK_GREEDY_DICTMATCHSTATE_ROW,
+                ZSTD_COMPRESSBLOCK_LAZY_DICTMATCHSTATE_ROW,
+                ZSTD_COMPRESSBLOCK_LAZY2_DICTMATCHSTATE_ROW
+            },
+            {
+                ZSTD_COMPRESSBLOCK_GREEDY_DEDICATEDDICTSEARCH_ROW,
+                ZSTD_COMPRESSBLOCK_LAZY_DEDICATEDDICTSEARCH_ROW,
+                ZSTD_COMPRESSBLOCK_LAZY2_DEDICATEDDICTSEARCH_ROW
+            }
         };
-        DEBUGLOG(4, "Selecting a row-based matchfinder");
+        DEBUGLOG(5, "Selecting a row-based matchfinder");
         assert(useRowMatchFinder !=3D ZSTD_ps_auto);
         selectedCompressor =3D rowBasedBlockCompressors[(int)dictMode][(in=
t)strat - (int)ZSTD_greedy];
     } else {
@@ -2704,30 +3065,126 @@ ZSTD_blockCompressor ZSTD_selectBlockCompressor(ZS=
TD_strategy strat, ZSTD_paramS
     return selectedCompressor;
 }
=20
-static void ZSTD_storeLastLiterals(seqStore_t* seqStorePtr,
+static void ZSTD_storeLastLiterals(SeqStore_t* seqStorePtr,
                                    const BYTE* anchor, size_t lastLLSize)
 {
     ZSTD_memcpy(seqStorePtr->lit, anchor, lastLLSize);
     seqStorePtr->lit +=3D lastLLSize;
 }
=20
-void ZSTD_resetSeqStore(seqStore_t* ssPtr)
+void ZSTD_resetSeqStore(SeqStore_t* ssPtr)
 {
     ssPtr->lit =3D ssPtr->litStart;
     ssPtr->sequences =3D ssPtr->sequencesStart;
     ssPtr->longLengthType =3D ZSTD_llt_none;
 }
=20
-typedef enum { ZSTDbss_compress, ZSTDbss_noCompress } ZSTD_buildSeqStore_e;
+/* ZSTD_postProcessSequenceProducerResult() :
+ * Validates and post-processes sequences obtained through the external ma=
tchfinder API:
+ *   - Checks whether nbExternalSeqs represents an error condition.
+ *   - Appends a block delimiter to outSeqs if one is not already present.
+ *     See zstd.h for context regarding block delimiters.
+ * Returns the number of sequences after post-processing, or an error code=
. */
+static size_t ZSTD_postProcessSequenceProducerResult(
+    ZSTD_Sequence* outSeqs, size_t nbExternalSeqs, size_t outSeqsCapacity,=
 size_t srcSize
+) {
+    RETURN_ERROR_IF(
+        nbExternalSeqs > outSeqsCapacity,
+        sequenceProducer_failed,
+        "External sequence producer returned error code %lu",
+        (unsigned long)nbExternalSeqs
+    );
+
+    RETURN_ERROR_IF(
+        nbExternalSeqs =3D=3D 0 && srcSize > 0,
+        sequenceProducer_failed,
+        "Got zero sequences from external sequence producer for a non-empt=
y src buffer!"
+    );
+
+    if (srcSize =3D=3D 0) {
+        ZSTD_memset(&outSeqs[0], 0, sizeof(ZSTD_Sequence));
+        return 1;
+    }
+
+    {
+        ZSTD_Sequence const lastSeq =3D outSeqs[nbExternalSeqs - 1];
+
+        /* We can return early if lastSeq is already a block delimiter. */
+        if (lastSeq.offset =3D=3D 0 && lastSeq.matchLength =3D=3D 0) {
+            return nbExternalSeqs;
+        }
+
+        /* This error condition is only possible if the external matchfind=
er
+         * produced an invalid parse, by definition of ZSTD_sequenceBound(=
). */
+        RETURN_ERROR_IF(
+            nbExternalSeqs =3D=3D outSeqsCapacity,
+            sequenceProducer_failed,
+            "nbExternalSeqs =3D=3D outSeqsCapacity but lastSeq is not a bl=
ock delimiter!"
+        );
+
+        /* lastSeq is not a block delimiter, so we need to append one. */
+        ZSTD_memset(&outSeqs[nbExternalSeqs], 0, sizeof(ZSTD_Sequence));
+        return nbExternalSeqs + 1;
+    }
+}
+
+/* ZSTD_fastSequenceLengthSum() :
+ * Returns sum(litLen) + sum(matchLen) + lastLits for *seqBuf*.
+ * Similar to another function in zstd_compress.c (determine_blockSize),
+ * except it doesn't check for a block delimiter to end summation.
+ * Removing the early exit allows the compiler to auto-vectorize (https://=
godbolt.org/z/cY1cajz9P).
+ * This function can be deleted and replaced by determine_blockSize after =
we resolve issue #3456. */
+static size_t ZSTD_fastSequenceLengthSum(ZSTD_Sequence const* seqBuf, size=
_t seqBufSize) {
+    size_t matchLenSum, litLenSum, i;
+    matchLenSum =3D 0;
+    litLenSum =3D 0;
+    for (i =3D 0; i < seqBufSize; i++) {
+        litLenSum +=3D seqBuf[i].litLength;
+        matchLenSum +=3D seqBuf[i].matchLength;
+    }
+    return litLenSum + matchLenSum;
+}
+
+/*
+ * Function to validate sequences produced by a block compressor.
+ */
+static void ZSTD_validateSeqStore(const SeqStore_t* seqStore, const ZSTD_c=
ompressionParameters* cParams)
+{
+#if DEBUGLEVEL >=3D 1
+    const SeqDef* seq =3D seqStore->sequencesStart;
+    const SeqDef* const seqEnd =3D seqStore->sequences;
+    size_t const matchLenLowerBound =3D cParams->minMatch =3D=3D 3 ? 3 : 4;
+    for (; seq < seqEnd; ++seq) {
+        const ZSTD_SequenceLength seqLength =3D ZSTD_getSequenceLength(seq=
Store, seq);
+        assert(seqLength.matchLength >=3D matchLenLowerBound);
+        (void)seqLength;
+        (void)matchLenLowerBound;
+    }
+#else
+    (void)seqStore;
+    (void)cParams;
+#endif
+}
+
+static size_t
+ZSTD_transferSequences_wBlockDelim(ZSTD_CCtx* cctx,
+                                   ZSTD_SequencePosition* seqPos,
+                             const ZSTD_Sequence* const inSeqs, size_t inS=
eqsSize,
+                             const void* src, size_t blockSize,
+                                   ZSTD_ParamSwitch_e externalRepSearch);
+
+typedef enum { ZSTDbss_compress, ZSTDbss_noCompress } ZSTD_BuildSeqStore_e;
=20
 static size_t ZSTD_buildSeqStore(ZSTD_CCtx* zc, const void* src, size_t sr=
cSize)
 {
-    ZSTD_matchState_t* const ms =3D &zc->blockState.matchState;
+    ZSTD_MatchState_t* const ms =3D &zc->blockState.matchState;
     DEBUGLOG(5, "ZSTD_buildSeqStore (srcSize=3D%zu)", srcSize);
     assert(srcSize <=3D ZSTD_BLOCKSIZE_MAX);
     /* Assert that we have correctly flushed the ctx params into the ms's =
copy */
     ZSTD_assertEqualCParams(zc->appliedParams.cParams, ms->cParams);
-    if (srcSize < MIN_CBLOCK_SIZE+ZSTD_blockHeaderSize+1) {
+    /* TODO: See 3090. We reduced MIN_CBLOCK_SIZE from 3 to 2 so to compen=
sate we are adding
+     * additional 1. We need to revisit and change this logic to be more c=
onsistent */
+    if (srcSize < MIN_CBLOCK_SIZE+ZSTD_blockHeaderSize+1+1) {
         if (zc->appliedParams.cParams.strategy >=3D ZSTD_btopt) {
             ZSTD_ldm_skipRawSeqStoreBytes(&zc->externSeqStore, srcSize);
         } else {
@@ -2763,6 +3220,15 @@ static size_t ZSTD_buildSeqStore(ZSTD_CCtx* zc, cons=
t void* src, size_t srcSize)
         }
         if (zc->externSeqStore.pos < zc->externSeqStore.size) {
             assert(zc->appliedParams.ldmParams.enableLdm =3D=3D ZSTD_ps_di=
sable);
+
+            /* External matchfinder + LDM is technically possible, just no=
t implemented yet.
+             * We need to revisit soon and implement it. */
+            RETURN_ERROR_IF(
+                ZSTD_hasExtSeqProd(&zc->appliedParams),
+                parameter_combination_unsupported,
+                "Long-distance matching with external sequence producer en=
abled is not currently supported."
+            );
+
             /* Updates ldmSeqStore.pos */
             lastLLSize =3D
                 ZSTD_ldm_blockCompress(&zc->externSeqStore,
@@ -2772,7 +3238,15 @@ static size_t ZSTD_buildSeqStore(ZSTD_CCtx* zc, cons=
t void* src, size_t srcSize)
                                        src, srcSize);
             assert(zc->externSeqStore.pos <=3D zc->externSeqStore.size);
         } else if (zc->appliedParams.ldmParams.enableLdm =3D=3D ZSTD_ps_en=
able) {
-            rawSeqStore_t ldmSeqStore =3D kNullRawSeqStore;
+            RawSeqStore_t ldmSeqStore =3D kNullRawSeqStore;
+
+            /* External matchfinder + LDM is technically possible, just no=
t implemented yet.
+             * We need to revisit soon and implement it. */
+            RETURN_ERROR_IF(
+                ZSTD_hasExtSeqProd(&zc->appliedParams),
+                parameter_combination_unsupported,
+                "Long-distance matching with external sequence producer en=
abled is not currently supported."
+            );
=20
             ldmSeqStore.seq =3D zc->ldmSequences;
             ldmSeqStore.capacity =3D zc->maxNbLdmSequences;
@@ -2788,42 +3262,116 @@ static size_t ZSTD_buildSeqStore(ZSTD_CCtx* zc, co=
nst void* src, size_t srcSize)
                                        zc->appliedParams.useRowMatchFinder,
                                        src, srcSize);
             assert(ldmSeqStore.pos =3D=3D ldmSeqStore.size);
-        } else {   /* not long range mode */
-            ZSTD_blockCompressor const blockCompressor =3D ZSTD_selectBloc=
kCompressor(zc->appliedParams.cParams.strategy,
-                                                                          =
          zc->appliedParams.useRowMatchFinder,
-                                                                          =
          dictMode);
+        } else if (ZSTD_hasExtSeqProd(&zc->appliedParams)) {
+            assert(
+                zc->extSeqBufCapacity >=3D ZSTD_sequenceBound(srcSize)
+            );
+            assert(zc->appliedParams.extSeqProdFunc !=3D NULL);
+
+            {   U32 const windowSize =3D (U32)1 << zc->appliedParams.cPara=
ms.windowLog;
+
+                size_t const nbExternalSeqs =3D (zc->appliedParams.extSeqP=
rodFunc)(
+                    zc->appliedParams.extSeqProdState,
+                    zc->extSeqBuf,
+                    zc->extSeqBufCapacity,
+                    src, srcSize,
+                    NULL, 0,  /* dict and dictSize, currently not supporte=
d */
+                    zc->appliedParams.compressionLevel,
+                    windowSize
+                );
+
+                size_t const nbPostProcessedSeqs =3D ZSTD_postProcessSeque=
nceProducerResult(
+                    zc->extSeqBuf,
+                    nbExternalSeqs,
+                    zc->extSeqBufCapacity,
+                    srcSize
+                );
+
+                /* Return early if there is no error, since we don't need =
to worry about last literals */
+                if (!ZSTD_isError(nbPostProcessedSeqs)) {
+                    ZSTD_SequencePosition seqPos =3D {0,0,0};
+                    size_t const seqLenSum =3D ZSTD_fastSequenceLengthSum(=
zc->extSeqBuf, nbPostProcessedSeqs);
+                    RETURN_ERROR_IF(seqLenSum > srcSize, externalSequences=
_invalid, "External sequences imply too large a block!");
+                    FORWARD_IF_ERROR(
+                        ZSTD_transferSequences_wBlockDelim(
+                            zc, &seqPos,
+                            zc->extSeqBuf, nbPostProcessedSeqs,
+                            src, srcSize,
+                            zc->appliedParams.searchForExternalRepcodes
+                        ),
+                        "Failed to copy external sequences to seqStore!"
+                    );
+                    ms->ldmSeqStore =3D NULL;
+                    DEBUGLOG(5, "Copied %lu sequences from external sequen=
ce producer to internal seqStore.", (unsigned long)nbExternalSeqs);
+                    return ZSTDbss_compress;
+                }
+
+                /* Propagate the error if fallback is disabled */
+                if (!zc->appliedParams.enableMatchFinderFallback) {
+                    return nbPostProcessedSeqs;
+                }
+
+                /* Fallback to software matchfinder */
+                {   ZSTD_BlockCompressor_f const blockCompressor =3D
+                        ZSTD_selectBlockCompressor(
+                            zc->appliedParams.cParams.strategy,
+                            zc->appliedParams.useRowMatchFinder,
+                            dictMode);
+                    ms->ldmSeqStore =3D NULL;
+                    DEBUGLOG(
+                        5,
+                        "External sequence producer returned error code %l=
u. Falling back to internal parser.",
+                        (unsigned long)nbExternalSeqs
+                    );
+                    lastLLSize =3D blockCompressor(ms, &zc->seqStore, zc->=
blockState.nextCBlock->rep, src, srcSize);
+            }   }
+        } else {   /* not long range mode and no external matchfinder */
+            ZSTD_BlockCompressor_f const blockCompressor =3D ZSTD_selectBl=
ockCompressor(
+                    zc->appliedParams.cParams.strategy,
+                    zc->appliedParams.useRowMatchFinder,
+                    dictMode);
             ms->ldmSeqStore =3D NULL;
             lastLLSize =3D blockCompressor(ms, &zc->seqStore, zc->blockSta=
te.nextCBlock->rep, src, srcSize);
         }
         {   const BYTE* const lastLiterals =3D (const BYTE*)src + srcSize =
- lastLLSize;
             ZSTD_storeLastLiterals(&zc->seqStore, lastLiterals, lastLLSize=
);
     }   }
+    ZSTD_validateSeqStore(&zc->seqStore, &zc->appliedParams.cParams);
     return ZSTDbss_compress;
 }
=20
-static void ZSTD_copyBlockSequences(ZSTD_CCtx* zc)
+static size_t ZSTD_copyBlockSequences(SeqCollector* seqCollector, const Se=
qStore_t* seqStore, const U32 prevRepcodes[ZSTD_REP_NUM])
 {
-    const seqStore_t* seqStore =3D ZSTD_getSeqStore(zc);
-    const seqDef* seqStoreSeqs =3D seqStore->sequencesStart;
-    size_t seqStoreSeqSize =3D seqStore->sequences - seqStoreSeqs;
-    size_t seqStoreLiteralsSize =3D (size_t)(seqStore->lit - seqStore->lit=
Start);
-    size_t literalsRead =3D 0;
-    size_t lastLLSize;
+    const SeqDef* inSeqs =3D seqStore->sequencesStart;
+    const size_t nbInSequences =3D (size_t)(seqStore->sequences - inSeqs);
+    const size_t nbInLiterals =3D (size_t)(seqStore->lit - seqStore->litSt=
art);
=20
-    ZSTD_Sequence* outSeqs =3D &zc->seqCollector.seqStart[zc->seqCollector=
.seqIndex];
+    ZSTD_Sequence* outSeqs =3D seqCollector->seqIndex =3D=3D 0 ? seqCollec=
tor->seqStart : seqCollector->seqStart + seqCollector->seqIndex;
+    const size_t nbOutSequences =3D nbInSequences + 1;
+    size_t nbOutLiterals =3D 0;
+    Repcodes_t repcodes;
     size_t i;
-    repcodes_t updatedRepcodes;
-
-    assert(zc->seqCollector.seqIndex + 1 < zc->seqCollector.maxSequences);
-    /* Ensure we have enough space for last literals "sequence" */
-    assert(zc->seqCollector.maxSequences >=3D seqStoreSeqSize + 1);
-    ZSTD_memcpy(updatedRepcodes.rep, zc->blockState.prevCBlock->rep, sizeo=
f(repcodes_t));
-    for (i =3D 0; i < seqStoreSeqSize; ++i) {
-        U32 rawOffset =3D seqStoreSeqs[i].offBase - ZSTD_REP_NUM;
-        outSeqs[i].litLength =3D seqStoreSeqs[i].litLength;
-        outSeqs[i].matchLength =3D seqStoreSeqs[i].mlBase + MINMATCH;
+
+    /* Bounds check that we have enough space for every input sequence
+     * and the block delimiter
+     */
+    assert(seqCollector->seqIndex <=3D seqCollector->maxSequences);
+    RETURN_ERROR_IF(
+        nbOutSequences > (size_t)(seqCollector->maxSequences - seqCollecto=
r->seqIndex),
+        dstSize_tooSmall,
+        "Not enough space to copy sequences");
+
+    ZSTD_memcpy(&repcodes, prevRepcodes, sizeof(repcodes));
+    for (i =3D 0; i < nbInSequences; ++i) {
+        U32 rawOffset;
+        outSeqs[i].litLength =3D inSeqs[i].litLength;
+        outSeqs[i].matchLength =3D inSeqs[i].mlBase + MINMATCH;
         outSeqs[i].rep =3D 0;
=20
+        /* Handle the possible single length >=3D 64K
+         * There can only be one because we add MINMATCH to every match le=
ngth,
+         * and blocks are at most 128K.
+         */
         if (i =3D=3D seqStore->longLengthPos) {
             if (seqStore->longLengthType =3D=3D ZSTD_llt_literalLength) {
                 outSeqs[i].litLength +=3D 0x10000;
@@ -2832,46 +3380,75 @@ static void ZSTD_copyBlockSequences(ZSTD_CCtx* zc)
             }
         }
=20
-        if (seqStoreSeqs[i].offBase <=3D ZSTD_REP_NUM) {
-            /* Derive the correct offset corresponding to a repcode */
-            outSeqs[i].rep =3D seqStoreSeqs[i].offBase;
+        /* Determine the raw offset given the offBase, which may be a repc=
ode. */
+        if (OFFBASE_IS_REPCODE(inSeqs[i].offBase)) {
+            const U32 repcode =3D OFFBASE_TO_REPCODE(inSeqs[i].offBase);
+            assert(repcode > 0);
+            outSeqs[i].rep =3D repcode;
             if (outSeqs[i].litLength !=3D 0) {
-                rawOffset =3D updatedRepcodes.rep[outSeqs[i].rep - 1];
+                rawOffset =3D repcodes.rep[repcode - 1];
             } else {
-                if (outSeqs[i].rep =3D=3D 3) {
-                    rawOffset =3D updatedRepcodes.rep[0] - 1;
+                if (repcode =3D=3D 3) {
+                    assert(repcodes.rep[0] > 1);
+                    rawOffset =3D repcodes.rep[0] - 1;
                 } else {
-                    rawOffset =3D updatedRepcodes.rep[outSeqs[i].rep];
+                    rawOffset =3D repcodes.rep[repcode];
                 }
             }
+        } else {
+            rawOffset =3D OFFBASE_TO_OFFSET(inSeqs[i].offBase);
         }
         outSeqs[i].offset =3D rawOffset;
-        /* seqStoreSeqs[i].offset =3D=3D offCode+1, and ZSTD_updateRep() e=
xpects offCode
-           so we provide seqStoreSeqs[i].offset - 1 */
-        ZSTD_updateRep(updatedRepcodes.rep,
-                       seqStoreSeqs[i].offBase - 1,
-                       seqStoreSeqs[i].litLength =3D=3D 0);
-        literalsRead +=3D outSeqs[i].litLength;
+
+        /* Update repcode history for the sequence */
+        ZSTD_updateRep(repcodes.rep,
+                       inSeqs[i].offBase,
+                       inSeqs[i].litLength =3D=3D 0);
+
+        nbOutLiterals +=3D outSeqs[i].litLength;
     }
     /* Insert last literals (if any exist) in the block as a sequence with=
 ml =3D=3D off =3D=3D 0.
      * If there are no last literals, then we'll emit (of: 0, ml: 0, ll: 0=
), which is a marker
      * for the block boundary, according to the API.
      */
-    assert(seqStoreLiteralsSize >=3D literalsRead);
-    lastLLSize =3D seqStoreLiteralsSize - literalsRead;
-    outSeqs[i].litLength =3D (U32)lastLLSize;
-    outSeqs[i].matchLength =3D outSeqs[i].offset =3D outSeqs[i].rep =3D 0;
-    seqStoreSeqSize++;
-    zc->seqCollector.seqIndex +=3D seqStoreSeqSize;
+    assert(nbInLiterals >=3D nbOutLiterals);
+    {
+        const size_t lastLLSize =3D nbInLiterals - nbOutLiterals;
+        outSeqs[nbInSequences].litLength =3D (U32)lastLLSize;
+        outSeqs[nbInSequences].matchLength =3D 0;
+        outSeqs[nbInSequences].offset =3D 0;
+        assert(nbOutSequences =3D=3D nbInSequences + 1);
+    }
+    seqCollector->seqIndex +=3D nbOutSequences;
+    assert(seqCollector->seqIndex <=3D seqCollector->maxSequences);
+
+    return 0;
+}
+
+size_t ZSTD_sequenceBound(size_t srcSize) {
+    const size_t maxNbSeq =3D (srcSize / ZSTD_MINMATCH_MIN) + 1;
+    const size_t maxNbDelims =3D (srcSize / ZSTD_BLOCKSIZE_MAX_MIN) + 1;
+    return maxNbSeq + maxNbDelims;
 }
=20
 size_t ZSTD_generateSequences(ZSTD_CCtx* zc, ZSTD_Sequence* outSeqs,
                               size_t outSeqsSize, const void* src, size_t =
srcSize)
 {
     const size_t dstCapacity =3D ZSTD_compressBound(srcSize);
-    void* dst =3D ZSTD_customMalloc(dstCapacity, ZSTD_defaultCMem);
+    void* dst; /* Make C90 happy. */
     SeqCollector seqCollector;
+    {
+        int targetCBlockSize;
+        FORWARD_IF_ERROR(ZSTD_CCtx_getParameter(zc, ZSTD_c_targetCBlockSiz=
e, &targetCBlockSize), "");
+        RETURN_ERROR_IF(targetCBlockSize !=3D 0, parameter_unsupported, "t=
argetCBlockSize !=3D 0");
+    }
+    {
+        int nbWorkers;
+        FORWARD_IF_ERROR(ZSTD_CCtx_getParameter(zc, ZSTD_c_nbWorkers, &nbW=
orkers), "");
+        RETURN_ERROR_IF(nbWorkers !=3D 0, parameter_unsupported, "nbWorker=
s !=3D 0");
+    }
=20
+    dst =3D ZSTD_customMalloc(dstCapacity, ZSTD_defaultCMem);
     RETURN_ERROR_IF(dst =3D=3D NULL, memory_allocation, "NULL pointer!");
=20
     seqCollector.collectSequences =3D 1;
@@ -2880,8 +3457,12 @@ size_t ZSTD_generateSequences(ZSTD_CCtx* zc, ZSTD_Se=
quence* outSeqs,
     seqCollector.maxSequences =3D outSeqsSize;
     zc->seqCollector =3D seqCollector;
=20
-    ZSTD_compress2(zc, dst, dstCapacity, src, srcSize);
-    ZSTD_customFree(dst, ZSTD_defaultCMem);
+    {
+        const size_t ret =3D ZSTD_compress2(zc, dst, dstCapacity, src, src=
Size);
+        ZSTD_customFree(dst, ZSTD_defaultCMem);
+        FORWARD_IF_ERROR(ret, "ZSTD_compress2 failed");
+    }
+    assert(zc->seqCollector.seqIndex <=3D ZSTD_sequenceBound(srcSize));
     return zc->seqCollector.seqIndex;
 }
=20
@@ -2910,19 +3491,17 @@ static int ZSTD_isRLE(const BYTE* src, size_t lengt=
h) {
     const size_t unrollMask =3D unrollSize - 1;
     const size_t prefixLength =3D length & unrollMask;
     size_t i;
-    size_t u;
     if (length =3D=3D 1) return 1;
     /* Check if prefix is RLE first before using unrolled loop */
     if (prefixLength && ZSTD_count(ip+1, ip, ip+prefixLength) !=3D prefixL=
ength-1) {
         return 0;
     }
     for (i =3D prefixLength; i !=3D length; i +=3D unrollSize) {
+        size_t u;
         for (u =3D 0; u < unrollSize; u +=3D sizeof(size_t)) {
             if (MEM_readST(ip + i + u) !=3D valueST) {
                 return 0;
-            }
-        }
-    }
+    }   }   }
     return 1;
 }
=20
@@ -2930,7 +3509,7 @@ static int ZSTD_isRLE(const BYTE* src, size_t length)=
 {
  * This is just a heuristic based on the compressibility.
  * It may return both false positives and false negatives.
  */
-static int ZSTD_maybeRLE(seqStore_t const* seqStore)
+static int ZSTD_maybeRLE(SeqStore_t const* seqStore)
 {
     size_t const nbSeqs =3D (size_t)(seqStore->sequences - seqStore->seque=
ncesStart);
     size_t const nbLits =3D (size_t)(seqStore->lit - seqStore->litStart);
@@ -2938,7 +3517,8 @@ static int ZSTD_maybeRLE(seqStore_t const* seqStore)
     return nbSeqs < 4 && nbLits < 10;
 }
=20
-static void ZSTD_blockState_confirmRepcodesAndEntropyTables(ZSTD_blockStat=
e_t* const bs)
+static void
+ZSTD_blockState_confirmRepcodesAndEntropyTables(ZSTD_blockState_t* const b=
s)
 {
     ZSTD_compressedBlockState_t* const tmp =3D bs->prevCBlock;
     bs->prevCBlock =3D bs->nextCBlock;
@@ -2946,12 +3526,14 @@ static void ZSTD_blockState_confirmRepcodesAndEntro=
pyTables(ZSTD_blockState_t* c
 }
=20
 /* Writes the block header */
-static void writeBlockHeader(void* op, size_t cSize, size_t blockSize, U32=
 lastBlock) {
+static void
+writeBlockHeader(void* op, size_t cSize, size_t blockSize, U32 lastBlock)
+{
     U32 const cBlockHeader =3D cSize =3D=3D 1 ?
                         lastBlock + (((U32)bt_rle)<<1) + (U32)(blockSize <=
< 3) :
                         lastBlock + (((U32)bt_compressed)<<1) + (U32)(cSiz=
e << 3);
     MEM_writeLE24(op, cBlockHeader);
-    DEBUGLOG(3, "writeBlockHeader: cSize: %zu blockSize: %zu lastBlock: %u=
", cSize, blockSize, lastBlock);
+    DEBUGLOG(5, "writeBlockHeader: cSize: %zu blockSize: %zu lastBlock: %u=
", cSize, blockSize, lastBlock);
 }
=20
 /* ZSTD_buildBlockEntropyStats_literals() :
@@ -2959,13 +3541,16 @@ static void writeBlockHeader(void* op, size_t cSize=
, size_t blockSize, U32 lastB
  *  Stores literals block type (raw, rle, compressed, repeat) and
  *  huffman description table to hufMetadata.
  *  Requires ENTROPY_WORKSPACE_SIZE workspace
- *  @return : size of huffman description table or error code */
-static size_t ZSTD_buildBlockEntropyStats_literals(void* const src, size_t=
 srcSize,
-                                            const ZSTD_hufCTables_t* prevH=
uf,
-                                                  ZSTD_hufCTables_t* nextH=
uf,
-                                                  ZSTD_hufCTablesMetadata_=
t* hufMetadata,
-                                                  const int literalsCompre=
ssionIsDisabled,
-                                                  void* workspace, size_t =
wkspSize)
+ * @return : size of huffman description table, or an error code
+ */
+static size_t
+ZSTD_buildBlockEntropyStats_literals(void* const src, size_t srcSize,
+                               const ZSTD_hufCTables_t* prevHuf,
+                                     ZSTD_hufCTables_t* nextHuf,
+                                     ZSTD_hufCTablesMetadata_t* hufMetadat=
a,
+                               const int literalsCompressionIsDisabled,
+                                     void* workspace, size_t wkspSize,
+                                     int hufFlags)
 {
     BYTE* const wkspStart =3D (BYTE*)workspace;
     BYTE* const wkspEnd =3D wkspStart + wkspSize;
@@ -2973,9 +3558,9 @@ static size_t ZSTD_buildBlockEntropyStats_literals(vo=
id* const src, size_t srcSi
     unsigned* const countWksp =3D (unsigned*)workspace;
     const size_t countWkspSize =3D (HUF_SYMBOLVALUE_MAX + 1) * sizeof(unsi=
gned);
     BYTE* const nodeWksp =3D countWkspStart + countWkspSize;
-    const size_t nodeWkspSize =3D wkspEnd-nodeWksp;
+    const size_t nodeWkspSize =3D (size_t)(wkspEnd - nodeWksp);
     unsigned maxSymbolValue =3D HUF_SYMBOLVALUE_MAX;
-    unsigned huffLog =3D HUF_TABLELOG_DEFAULT;
+    unsigned huffLog =3D LitHufLog;
     HUF_repeat repeat =3D prevHuf->repeatMode;
     DEBUGLOG(5, "ZSTD_buildBlockEntropyStats_literals (srcSize=3D%zu)", sr=
cSize);
=20
@@ -2990,73 +3575,77 @@ static size_t ZSTD_buildBlockEntropyStats_literals(=
void* const src, size_t srcSi
=20
     /* small ? don't even attempt compression (speed opt) */
 #ifndef COMPRESS_LITERALS_SIZE_MIN
-#define COMPRESS_LITERALS_SIZE_MIN 63
+# define COMPRESS_LITERALS_SIZE_MIN 63  /* heuristic */
 #endif
     {   size_t const minLitSize =3D (prevHuf->repeatMode =3D=3D HUF_repeat=
_valid) ? 6 : COMPRESS_LITERALS_SIZE_MIN;
         if (srcSize <=3D minLitSize) {
             DEBUGLOG(5, "set_basic - too small");
             hufMetadata->hType =3D set_basic;
             return 0;
-        }
-    }
+    }   }
=20
     /* Scan input and build symbol stats */
-    {   size_t const largest =3D HIST_count_wksp (countWksp, &maxSymbolVal=
ue, (const BYTE*)src, srcSize, workspace, wkspSize);
+    {   size_t const largest =3D
+            HIST_count_wksp (countWksp, &maxSymbolValue,
+                            (const BYTE*)src, srcSize,
+                            workspace, wkspSize);
         FORWARD_IF_ERROR(largest, "HIST_count_wksp failed");
         if (largest =3D=3D srcSize) {
+            /* only one literal symbol */
             DEBUGLOG(5, "set_rle");
             hufMetadata->hType =3D set_rle;
             return 0;
         }
         if (largest <=3D (srcSize >> 7)+4) {
+            /* heuristic: likely not compressible */
             DEBUGLOG(5, "set_basic - no gain");
             hufMetadata->hType =3D set_basic;
             return 0;
-        }
-    }
+    }   }
=20
     /* Validate the previous Huffman table */
-    if (repeat =3D=3D HUF_repeat_check && !HUF_validateCTable((HUF_CElt co=
nst*)prevHuf->CTable, countWksp, maxSymbolValue)) {
+    if (repeat =3D=3D HUF_repeat_check
+      && !HUF_validateCTable((HUF_CElt const*)prevHuf->CTable, countWksp, =
maxSymbolValue)) {
         repeat =3D HUF_repeat_none;
     }
=20
     /* Build Huffman Tree */
     ZSTD_memset(nextHuf->CTable, 0, sizeof(nextHuf->CTable));
-    huffLog =3D HUF_optimalTableLog(huffLog, srcSize, maxSymbolValue);
+    huffLog =3D HUF_optimalTableLog(huffLog, srcSize, maxSymbolValue, node=
Wksp, nodeWkspSize, nextHuf->CTable, countWksp, hufFlags);
+    assert(huffLog <=3D LitHufLog);
     {   size_t const maxBits =3D HUF_buildCTable_wksp((HUF_CElt*)nextHuf->=
CTable, countWksp,
                                                     maxSymbolValue, huffLo=
g,
                                                     nodeWksp, nodeWkspSize=
);
         FORWARD_IF_ERROR(maxBits, "HUF_buildCTable_wksp");
         huffLog =3D (U32)maxBits;
-        {   /* Build and write the CTable */
-            size_t const newCSize =3D HUF_estimateCompressedSize(
-                    (HUF_CElt*)nextHuf->CTable, countWksp, maxSymbolValue);
-            size_t const hSize =3D HUF_writeCTable_wksp(
-                    hufMetadata->hufDesBuffer, sizeof(hufMetadata->hufDesB=
uffer),
-                    (HUF_CElt*)nextHuf->CTable, maxSymbolValue, huffLog,
-                    nodeWksp, nodeWkspSize);
-            /* Check against repeating the previous CTable */
-            if (repeat !=3D HUF_repeat_none) {
-                size_t const oldCSize =3D HUF_estimateCompressedSize(
-                        (HUF_CElt const*)prevHuf->CTable, countWksp, maxSy=
mbolValue);
-                if (oldCSize < srcSize && (oldCSize <=3D hSize + newCSize =
|| hSize + 12 >=3D srcSize)) {
-                    DEBUGLOG(5, "set_repeat - smaller");
-                    ZSTD_memcpy(nextHuf, prevHuf, sizeof(*prevHuf));
-                    hufMetadata->hType =3D set_repeat;
-                    return 0;
-                }
-            }
-            if (newCSize + hSize >=3D srcSize) {
-                DEBUGLOG(5, "set_basic - no gains");
+    }
+    {   /* Build and write the CTable */
+        size_t const newCSize =3D HUF_estimateCompressedSize(
+                (HUF_CElt*)nextHuf->CTable, countWksp, maxSymbolValue);
+        size_t const hSize =3D HUF_writeCTable_wksp(
+                hufMetadata->hufDesBuffer, sizeof(hufMetadata->hufDesBuffe=
r),
+                (HUF_CElt*)nextHuf->CTable, maxSymbolValue, huffLog,
+                nodeWksp, nodeWkspSize);
+        /* Check against repeating the previous CTable */
+        if (repeat !=3D HUF_repeat_none) {
+            size_t const oldCSize =3D HUF_estimateCompressedSize(
+                    (HUF_CElt const*)prevHuf->CTable, countWksp, maxSymbol=
Value);
+            if (oldCSize < srcSize && (oldCSize <=3D hSize + newCSize || h=
Size + 12 >=3D srcSize)) {
+                DEBUGLOG(5, "set_repeat - smaller");
                 ZSTD_memcpy(nextHuf, prevHuf, sizeof(*prevHuf));
-                hufMetadata->hType =3D set_basic;
+                hufMetadata->hType =3D set_repeat;
                 return 0;
-            }
-            DEBUGLOG(5, "set_compressed (hSize=3D%u)", (U32)hSize);
-            hufMetadata->hType =3D set_compressed;
-            nextHuf->repeatMode =3D HUF_repeat_check;
-            return hSize;
+        }   }
+        if (newCSize + hSize >=3D srcSize) {
+            DEBUGLOG(5, "set_basic - no gains");
+            ZSTD_memcpy(nextHuf, prevHuf, sizeof(*prevHuf));
+            hufMetadata->hType =3D set_basic;
+            return 0;
         }
+        DEBUGLOG(5, "set_compressed (hSize=3D%u)", (U32)hSize);
+        hufMetadata->hType =3D set_compressed;
+        nextHuf->repeatMode =3D HUF_repeat_check;
+        return hSize;
     }
 }
=20
@@ -3066,8 +3655,9 @@ static size_t ZSTD_buildBlockEntropyStats_literals(vo=
id* const src, size_t srcSi
  * and updates nextEntropy to the appropriate repeatMode.
  */
 static ZSTD_symbolEncodingTypeStats_t
-ZSTD_buildDummySequencesStatistics(ZSTD_fseCTables_t* nextEntropy) {
-    ZSTD_symbolEncodingTypeStats_t stats =3D {set_basic, set_basic, set_ba=
sic, 0, 0};
+ZSTD_buildDummySequencesStatistics(ZSTD_fseCTables_t* nextEntropy)
+{
+    ZSTD_symbolEncodingTypeStats_t stats =3D {set_basic, set_basic, set_ba=
sic, 0, 0, 0};
     nextEntropy->litlength_repeatMode =3D FSE_repeat_none;
     nextEntropy->offcode_repeatMode =3D FSE_repeat_none;
     nextEntropy->matchlength_repeatMode =3D FSE_repeat_none;
@@ -3078,16 +3668,18 @@ ZSTD_buildDummySequencesStatistics(ZSTD_fseCTables_=
t* nextEntropy) {
  *  Builds entropy for the sequences.
  *  Stores symbol compression modes and fse table to fseMetadata.
  *  Requires ENTROPY_WORKSPACE_SIZE wksp.
- *  @return : size of fse tables or error code */
-static size_t ZSTD_buildBlockEntropyStats_sequences(seqStore_t* seqStorePt=
r,
-                                              const ZSTD_fseCTables_t* pre=
vEntropy,
-                                                    ZSTD_fseCTables_t* nex=
tEntropy,
-                                              const ZSTD_CCtx_params* cctx=
Params,
-                                                    ZSTD_fseCTablesMetadat=
a_t* fseMetadata,
-                                                    void* workspace, size_=
t wkspSize)
+ * @return : size of fse tables or error code */
+static size_t
+ZSTD_buildBlockEntropyStats_sequences(
+                const SeqStore_t* seqStorePtr,
+                const ZSTD_fseCTables_t* prevEntropy,
+                      ZSTD_fseCTables_t* nextEntropy,
+                const ZSTD_CCtx_params* cctxParams,
+                      ZSTD_fseCTablesMetadata_t* fseMetadata,
+                      void* workspace, size_t wkspSize)
 {
     ZSTD_strategy const strategy =3D cctxParams->cParams.strategy;
-    size_t const nbSeq =3D seqStorePtr->sequences - seqStorePtr->sequences=
Start;
+    size_t const nbSeq =3D (size_t)(seqStorePtr->sequences - seqStorePtr->=
sequencesStart);
     BYTE* const ostart =3D fseMetadata->fseTablesBuffer;
     BYTE* const oend =3D ostart + sizeof(fseMetadata->fseTablesBuffer);
     BYTE* op =3D ostart;
@@ -3103,9 +3695,9 @@ static size_t ZSTD_buildBlockEntropyStats_sequences(s=
eqStore_t* seqStorePtr,
                                           entropyWorkspace, entropyWorkspa=
ceSize)
                        : ZSTD_buildDummySequencesStatistics(nextEntropy);
     FORWARD_IF_ERROR(stats.size, "ZSTD_buildSequencesStatistics failed!");
-    fseMetadata->llType =3D (symbolEncodingType_e) stats.LLtype;
-    fseMetadata->ofType =3D (symbolEncodingType_e) stats.Offtype;
-    fseMetadata->mlType =3D (symbolEncodingType_e) stats.MLtype;
+    fseMetadata->llType =3D (SymbolEncodingType_e) stats.LLtype;
+    fseMetadata->ofType =3D (SymbolEncodingType_e) stats.Offtype;
+    fseMetadata->mlType =3D (SymbolEncodingType_e) stats.MLtype;
     fseMetadata->lastCountSize =3D stats.lastCountSize;
     return stats.size;
 }
@@ -3114,23 +3706,28 @@ static size_t ZSTD_buildBlockEntropyStats_sequences=
(seqStore_t* seqStorePtr,
 /* ZSTD_buildBlockEntropyStats() :
  *  Builds entropy for the block.
  *  Requires workspace size ENTROPY_WORKSPACE_SIZE
- *
- *  @return : 0 on success or error code
+ * @return : 0 on success, or an error code
+ *  Note : also employed in superblock
  */
-size_t ZSTD_buildBlockEntropyStats(seqStore_t* seqStorePtr,
-                             const ZSTD_entropyCTables_t* prevEntropy,
-                                   ZSTD_entropyCTables_t* nextEntropy,
-                             const ZSTD_CCtx_params* cctxParams,
-                                   ZSTD_entropyCTablesMetadata_t* entropyM=
etadata,
-                                   void* workspace, size_t wkspSize)
-{
-    size_t const litSize =3D seqStorePtr->lit - seqStorePtr->litStart;
+size_t ZSTD_buildBlockEntropyStats(
+            const SeqStore_t* seqStorePtr,
+            const ZSTD_entropyCTables_t* prevEntropy,
+                  ZSTD_entropyCTables_t* nextEntropy,
+            const ZSTD_CCtx_params* cctxParams,
+                  ZSTD_entropyCTablesMetadata_t* entropyMetadata,
+                  void* workspace, size_t wkspSize)
+{
+    size_t const litSize =3D (size_t)(seqStorePtr->lit - seqStorePtr->litS=
tart);
+    int const huf_useOptDepth =3D (cctxParams->cParams.strategy >=3D HUF_O=
PTIMAL_DEPTH_THRESHOLD);
+    int const hufFlags =3D huf_useOptDepth ? HUF_flags_optimalDepth : 0;
+
     entropyMetadata->hufMetadata.hufDesSize =3D
         ZSTD_buildBlockEntropyStats_literals(seqStorePtr->litStart, litSiz=
e,
                                             &prevEntropy->huf, &nextEntrop=
y->huf,
                                             &entropyMetadata->hufMetadata,
                                             ZSTD_literalsCompressionIsDisa=
bled(cctxParams),
-                                            workspace, wkspSize);
+                                            workspace, wkspSize, hufFlags);
+
     FORWARD_IF_ERROR(entropyMetadata->hufMetadata.hufDesSize, "ZSTD_buildB=
lockEntropyStats_literals failed");
     entropyMetadata->fseMetadata.fseTablesSize =3D
         ZSTD_buildBlockEntropyStats_sequences(seqStorePtr,
@@ -3143,11 +3740,12 @@ size_t ZSTD_buildBlockEntropyStats(seqStore_t* seqS=
torePtr,
 }
=20
 /* Returns the size estimate for the literals section (header + content) o=
f a block */
-static size_t ZSTD_estimateBlockSize_literal(const BYTE* literals, size_t =
litSize,
-                                                const ZSTD_hufCTables_t* h=
uf,
-                                                const ZSTD_hufCTablesMetad=
ata_t* hufMetadata,
-                                                void* workspace, size_t wk=
spSize,
-                                                int writeEntropy)
+static size_t
+ZSTD_estimateBlockSize_literal(const BYTE* literals, size_t litSize,
+                               const ZSTD_hufCTables_t* huf,
+                               const ZSTD_hufCTablesMetadata_t* hufMetadat=
a,
+                               void* workspace, size_t wkspSize,
+                               int writeEntropy)
 {
     unsigned* const countWksp =3D (unsigned*)workspace;
     unsigned maxSymbolValue =3D HUF_SYMBOLVALUE_MAX;
@@ -3169,12 +3767,13 @@ static size_t ZSTD_estimateBlockSize_literal(const =
BYTE* literals, size_t litSiz
 }
=20
 /* Returns the size estimate for the FSE-compressed symbols (of, ml, ll) o=
f a block */
-static size_t ZSTD_estimateBlockSize_symbolType(symbolEncodingType_e type,
-                        const BYTE* codeTable, size_t nbSeq, unsigned maxC=
ode,
-                        const FSE_CTable* fseCTable,
-                        const U8* additionalBits,
-                        short const* defaultNorm, U32 defaultNormLog, U32 =
defaultMax,
-                        void* workspace, size_t wkspSize)
+static size_t
+ZSTD_estimateBlockSize_symbolType(SymbolEncodingType_e type,
+                    const BYTE* codeTable, size_t nbSeq, unsigned maxCode,
+                    const FSE_CTable* fseCTable,
+                    const U8* additionalBits,
+                    short const* defaultNorm, U32 defaultNormLog, U32 defa=
ultMax,
+                    void* workspace, size_t wkspSize)
 {
     unsigned* const countWksp =3D (unsigned*)workspace;
     const BYTE* ctp =3D codeTable;
@@ -3206,116 +3805,121 @@ static size_t ZSTD_estimateBlockSize_symbolType(s=
ymbolEncodingType_e type,
 }
=20
 /* Returns the size estimate for the sequences section (header + content) =
of a block */
-static size_t ZSTD_estimateBlockSize_sequences(const BYTE* ofCodeTable,
-                                                  const BYTE* llCodeTable,
-                                                  const BYTE* mlCodeTable,
-                                                  size_t nbSeq,
-                                                  const ZSTD_fseCTables_t*=
 fseTables,
-                                                  const ZSTD_fseCTablesMet=
adata_t* fseMetadata,
-                                                  void* workspace, size_t =
wkspSize,
-                                                  int writeEntropy)
+static size_t
+ZSTD_estimateBlockSize_sequences(const BYTE* ofCodeTable,
+                                 const BYTE* llCodeTable,
+                                 const BYTE* mlCodeTable,
+                                 size_t nbSeq,
+                                 const ZSTD_fseCTables_t* fseTables,
+                                 const ZSTD_fseCTablesMetadata_t* fseMetad=
ata,
+                                 void* workspace, size_t wkspSize,
+                                 int writeEntropy)
 {
     size_t sequencesSectionHeaderSize =3D 1 /* seqHead */ + 1 /* min seqSi=
ze size */ + (nbSeq >=3D 128) + (nbSeq >=3D LONGNBSEQ);
     size_t cSeqSizeEstimate =3D 0;
     cSeqSizeEstimate +=3D ZSTD_estimateBlockSize_symbolType(fseMetadata->o=
fType, ofCodeTable, nbSeq, MaxOff,
-                                         fseTables->offcodeCTable, NULL,
-                                         OF_defaultNorm, OF_defaultNormLog=
, DefaultMaxOff,
-                                         workspace, wkspSize);
+                                    fseTables->offcodeCTable, NULL,
+                                    OF_defaultNorm, OF_defaultNormLog, Def=
aultMaxOff,
+                                    workspace, wkspSize);
     cSeqSizeEstimate +=3D ZSTD_estimateBlockSize_symbolType(fseMetadata->l=
lType, llCodeTable, nbSeq, MaxLL,
-                                         fseTables->litlengthCTable, LL_bi=
ts,
-                                         LL_defaultNorm, LL_defaultNormLog=
, MaxLL,
-                                         workspace, wkspSize);
+                                    fseTables->litlengthCTable, LL_bits,
+                                    LL_defaultNorm, LL_defaultNormLog, Max=
LL,
+                                    workspace, wkspSize);
     cSeqSizeEstimate +=3D ZSTD_estimateBlockSize_symbolType(fseMetadata->m=
lType, mlCodeTable, nbSeq, MaxML,
-                                         fseTables->matchlengthCTable, ML_=
bits,
-                                         ML_defaultNorm, ML_defaultNormLog=
, MaxML,
-                                         workspace, wkspSize);
+                                    fseTables->matchlengthCTable, ML_bits,
+                                    ML_defaultNorm, ML_defaultNormLog, Max=
ML,
+                                    workspace, wkspSize);
     if (writeEntropy) cSeqSizeEstimate +=3D fseMetadata->fseTablesSize;
     return cSeqSizeEstimate + sequencesSectionHeaderSize;
 }
=20
 /* Returns the size estimate for a given stream of literals, of, ll, ml */
-static size_t ZSTD_estimateBlockSize(const BYTE* literals, size_t litSize,
-                                     const BYTE* ofCodeTable,
-                                     const BYTE* llCodeTable,
-                                     const BYTE* mlCodeTable,
-                                     size_t nbSeq,
-                                     const ZSTD_entropyCTables_t* entropy,
-                                     const ZSTD_entropyCTablesMetadata_t* =
entropyMetadata,
-                                     void* workspace, size_t wkspSize,
-                                     int writeLitEntropy, int writeSeqEntr=
opy) {
+static size_t
+ZSTD_estimateBlockSize(const BYTE* literals, size_t litSize,
+                       const BYTE* ofCodeTable,
+                       const BYTE* llCodeTable,
+                       const BYTE* mlCodeTable,
+                       size_t nbSeq,
+                       const ZSTD_entropyCTables_t* entropy,
+                       const ZSTD_entropyCTablesMetadata_t* entropyMetadat=
a,
+                       void* workspace, size_t wkspSize,
+                       int writeLitEntropy, int writeSeqEntropy)
+{
     size_t const literalsSize =3D ZSTD_estimateBlockSize_literal(literals,=
 litSize,
-                                                         &entropy->huf, &e=
ntropyMetadata->hufMetadata,
-                                                         workspace, wkspSi=
ze, writeLitEntropy);
+                                    &entropy->huf, &entropyMetadata->hufMe=
tadata,
+                                    workspace, wkspSize, writeLitEntropy);
     size_t const seqSize =3D ZSTD_estimateBlockSize_sequences(ofCodeTable,=
 llCodeTable, mlCodeTable,
-                                                         nbSeq, &entropy->=
fse, &entropyMetadata->fseMetadata,
-                                                         workspace, wkspSi=
ze, writeSeqEntropy);
+                                    nbSeq, &entropy->fse, &entropyMetadata=
->fseMetadata,
+                                    workspace, wkspSize, writeSeqEntropy);
     return seqSize + literalsSize + ZSTD_blockHeaderSize;
 }
=20
 /* Builds entropy statistics and uses them for blocksize estimation.
  *
- * Returns the estimated compressed size of the seqStore, or a zstd error.
+ * @return: estimated compressed size of the seqStore, or a zstd error.
  */
-static size_t ZSTD_buildEntropyStatisticsAndEstimateSubBlockSize(seqStore_=
t* seqStore, ZSTD_CCtx* zc) {
-    ZSTD_entropyCTablesMetadata_t* entropyMetadata =3D &zc->blockSplitCtx.=
entropyMetadata;
+static size_t
+ZSTD_buildEntropyStatisticsAndEstimateSubBlockSize(SeqStore_t* seqStore, Z=
STD_CCtx* zc)
+{
+    ZSTD_entropyCTablesMetadata_t* const entropyMetadata =3D &zc->blockSpl=
itCtx.entropyMetadata;
     DEBUGLOG(6, "ZSTD_buildEntropyStatisticsAndEstimateSubBlockSize()");
     FORWARD_IF_ERROR(ZSTD_buildBlockEntropyStats(seqStore,
                     &zc->blockState.prevCBlock->entropy,
                     &zc->blockState.nextCBlock->entropy,
                     &zc->appliedParams,
                     entropyMetadata,
-                    zc->entropyWorkspace, ENTROPY_WORKSPACE_SIZE /* static=
ally allocated in resetCCtx */), "");
-    return ZSTD_estimateBlockSize(seqStore->litStart, (size_t)(seqStore->l=
it - seqStore->litStart),
+                    zc->tmpWorkspace, zc->tmpWkspSize), "");
+    return ZSTD_estimateBlockSize(
+                    seqStore->litStart, (size_t)(seqStore->lit - seqStore-=
>litStart),
                     seqStore->ofCode, seqStore->llCode, seqStore->mlCode,
                     (size_t)(seqStore->sequences - seqStore->sequencesStar=
t),
-                    &zc->blockState.nextCBlock->entropy, entropyMetadata, =
zc->entropyWorkspace, ENTROPY_WORKSPACE_SIZE,
+                    &zc->blockState.nextCBlock->entropy,
+                    entropyMetadata,
+                    zc->tmpWorkspace, zc->tmpWkspSize,
                     (int)(entropyMetadata->hufMetadata.hType =3D=3D set_co=
mpressed), 1);
 }
=20
 /* Returns literals bytes represented in a seqStore */
-static size_t ZSTD_countSeqStoreLiteralsBytes(const seqStore_t* const seqS=
tore) {
+static size_t ZSTD_countSeqStoreLiteralsBytes(const SeqStore_t* const seqS=
tore)
+{
     size_t literalsBytes =3D 0;
-    size_t const nbSeqs =3D seqStore->sequences - seqStore->sequencesStart;
+    size_t const nbSeqs =3D (size_t)(seqStore->sequences - seqStore->seque=
ncesStart);
     size_t i;
     for (i =3D 0; i < nbSeqs; ++i) {
-        seqDef seq =3D seqStore->sequencesStart[i];
+        SeqDef const seq =3D seqStore->sequencesStart[i];
         literalsBytes +=3D seq.litLength;
         if (i =3D=3D seqStore->longLengthPos && seqStore->longLengthType =
=3D=3D ZSTD_llt_literalLength) {
             literalsBytes +=3D 0x10000;
-        }
-    }
+    }   }
     return literalsBytes;
 }
=20
 /* Returns match bytes represented in a seqStore */
-static size_t ZSTD_countSeqStoreMatchBytes(const seqStore_t* const seqStor=
e) {
+static size_t ZSTD_countSeqStoreMatchBytes(const SeqStore_t* const seqStor=
e)
+{
     size_t matchBytes =3D 0;
-    size_t const nbSeqs =3D seqStore->sequences - seqStore->sequencesStart;
+    size_t const nbSeqs =3D (size_t)(seqStore->sequences - seqStore->seque=
ncesStart);
     size_t i;
     for (i =3D 0; i < nbSeqs; ++i) {
-        seqDef seq =3D seqStore->sequencesStart[i];
+        SeqDef seq =3D seqStore->sequencesStart[i];
         matchBytes +=3D seq.mlBase + MINMATCH;
         if (i =3D=3D seqStore->longLengthPos && seqStore->longLengthType =
=3D=3D ZSTD_llt_matchLength) {
             matchBytes +=3D 0x10000;
-        }
-    }
+    }   }
     return matchBytes;
 }
=20
 /* Derives the seqStore that is a chunk of the originalSeqStore from [star=
tIdx, endIdx).
  * Stores the result in resultSeqStore.
  */
-static void ZSTD_deriveSeqStoreChunk(seqStore_t* resultSeqStore,
-                               const seqStore_t* originalSeqStore,
-                                     size_t startIdx, size_t endIdx) {
-    BYTE* const litEnd =3D originalSeqStore->lit;
-    size_t literalsBytes;
-    size_t literalsBytesPreceding =3D 0;
-
+static void ZSTD_deriveSeqStoreChunk(SeqStore_t* resultSeqStore,
+                               const SeqStore_t* originalSeqStore,
+                                     size_t startIdx, size_t endIdx)
+{
     *resultSeqStore =3D *originalSeqStore;
     if (startIdx > 0) {
         resultSeqStore->sequences =3D originalSeqStore->sequencesStart + s=
tartIdx;
-        literalsBytesPreceding =3D ZSTD_countSeqStoreLiteralsBytes(resultS=
eqStore);
+        resultSeqStore->litStart +=3D ZSTD_countSeqStoreLiteralsBytes(resu=
ltSeqStore);
     }
=20
     /* Move longLengthPos into the correct position if necessary */
@@ -3328,13 +3932,12 @@ static void ZSTD_deriveSeqStoreChunk(seqStore_t* re=
sultSeqStore,
     }
     resultSeqStore->sequencesStart =3D originalSeqStore->sequencesStart + =
startIdx;
     resultSeqStore->sequences =3D originalSeqStore->sequencesStart + endId=
x;
-    literalsBytes =3D ZSTD_countSeqStoreLiteralsBytes(resultSeqStore);
-    resultSeqStore->litStart +=3D literalsBytesPreceding;
     if (endIdx =3D=3D (size_t)(originalSeqStore->sequences - originalSeqSt=
ore->sequencesStart)) {
         /* This accounts for possible last literals if the derived chunk r=
eaches the end of the block */
-        resultSeqStore->lit =3D litEnd;
+        assert(resultSeqStore->lit =3D=3D originalSeqStore->lit);
     } else {
-        resultSeqStore->lit =3D resultSeqStore->litStart+literalsBytes;
+        size_t const literalsBytes =3D ZSTD_countSeqStoreLiteralsBytes(res=
ultSeqStore);
+        resultSeqStore->lit =3D resultSeqStore->litStart + literalsBytes;
     }
     resultSeqStore->llCode +=3D startIdx;
     resultSeqStore->mlCode +=3D startIdx;
@@ -3342,20 +3945,26 @@ static void ZSTD_deriveSeqStoreChunk(seqStore_t* re=
sultSeqStore,
 }
=20
 /*
- * Returns the raw offset represented by the combination of offCode, ll0, =
and repcode history.
- * offCode must represent a repcode in the numeric representation of ZSTD_=
storeSeq().
+ * Returns the raw offset represented by the combination of offBase, ll0, =
and repcode history.
+ * offBase must represent a repcode in the numeric representation of ZSTD_=
storeSeq().
  */
 static U32
-ZSTD_resolveRepcodeToRawOffset(const U32 rep[ZSTD_REP_NUM], const U32 offC=
ode, const U32 ll0)
-{
-    U32 const adjustedOffCode =3D STORED_REPCODE(offCode) - 1 + ll0;  /* [=
 0 - 3 ] */
-    assert(STORED_IS_REPCODE(offCode));
-    if (adjustedOffCode =3D=3D ZSTD_REP_NUM) {
-        /* litlength =3D=3D 0 and offCode =3D=3D 2 implies selection of fi=
rst repcode - 1 */
-        assert(rep[0] > 0);
+ZSTD_resolveRepcodeToRawOffset(const U32 rep[ZSTD_REP_NUM], const U32 offB=
ase, const U32 ll0)
+{
+    U32 const adjustedRepCode =3D OFFBASE_TO_REPCODE(offBase) - 1 + ll0;  =
/* [ 0 - 3 ] */
+    assert(OFFBASE_IS_REPCODE(offBase));
+    if (adjustedRepCode =3D=3D ZSTD_REP_NUM) {
+        assert(ll0);
+        /* litlength =3D=3D 0 and offCode =3D=3D 2 implies selection of fi=
rst repcode - 1
+         * This is only valid if it results in a valid offset value, aka >=
 0.
+         * Note : it may happen that `rep[0]=3D=3D1` in exceptional circum=
stances.
+         * In which case this function will return 0, which is an invalid =
offset.
+         * It's not an issue though, since this value will be
+         * compared and discarded within ZSTD_seqStore_resolveOffCodes().
+         */
         return rep[0] - 1;
     }
-    return rep[adjustedOffCode];
+    return rep[adjustedRepCode];
 }
=20
 /*
@@ -3371,30 +3980,33 @@ ZSTD_resolveRepcodeToRawOffset(const U32 rep[ZSTD_R=
EP_NUM], const U32 offCode, c
  *        1-3 : repcode 1-3
  *        4+ : real_offset+3
  */
-static void ZSTD_seqStore_resolveOffCodes(repcodes_t* const dRepcodes, rep=
codes_t* const cRepcodes,
-                                          seqStore_t* const seqStore, U32 =
const nbSeq) {
+static void
+ZSTD_seqStore_resolveOffCodes(Repcodes_t* const dRepcodes, Repcodes_t* con=
st cRepcodes,
+                        const SeqStore_t* const seqStore, U32 const nbSeq)
+{
     U32 idx =3D 0;
+    U32 const longLitLenIdx =3D seqStore->longLengthType =3D=3D ZSTD_llt_l=
iteralLength ? seqStore->longLengthPos : nbSeq;
     for (; idx < nbSeq; ++idx) {
-        seqDef* const seq =3D seqStore->sequencesStart + idx;
-        U32 const ll0 =3D (seq->litLength =3D=3D 0);
-        U32 const offCode =3D OFFBASE_TO_STORED(seq->offBase);
-        assert(seq->offBase > 0);
-        if (STORED_IS_REPCODE(offCode)) {
-            U32 const dRawOffset =3D ZSTD_resolveRepcodeToRawOffset(dRepco=
des->rep, offCode, ll0);
-            U32 const cRawOffset =3D ZSTD_resolveRepcodeToRawOffset(cRepco=
des->rep, offCode, ll0);
+        SeqDef* const seq =3D seqStore->sequencesStart + idx;
+        U32 const ll0 =3D (seq->litLength =3D=3D 0) && (idx !=3D longLitLe=
nIdx);
+        U32 const offBase =3D seq->offBase;
+        assert(offBase > 0);
+        if (OFFBASE_IS_REPCODE(offBase)) {
+            U32 const dRawOffset =3D ZSTD_resolveRepcodeToRawOffset(dRepco=
des->rep, offBase, ll0);
+            U32 const cRawOffset =3D ZSTD_resolveRepcodeToRawOffset(cRepco=
des->rep, offBase, ll0);
             /* Adjust simulated decompression repcode history if we come a=
cross a mismatch. Replace
              * the repcode with the offset it actually references, determi=
ned by the compression
              * repcode history.
              */
             if (dRawOffset !=3D cRawOffset) {
-                seq->offBase =3D cRawOffset + ZSTD_REP_NUM;
+                seq->offBase =3D OFFSET_TO_OFFBASE(cRawOffset);
             }
         }
         /* Compression repcode history is always updated with values direc=
tly from the unmodified seqStore.
          * Decompression repcode history may use modified seq->offset valu=
e taken from compression repcode history.
          */
-        ZSTD_updateRep(dRepcodes->rep, OFFBASE_TO_STORED(seq->offBase), ll=
0);
-        ZSTD_updateRep(cRepcodes->rep, offCode, ll0);
+        ZSTD_updateRep(dRepcodes->rep, seq->offBase, ll0);
+        ZSTD_updateRep(cRepcodes->rep, offBase, ll0);
     }
 }
=20
@@ -3404,10 +4016,11 @@ static void ZSTD_seqStore_resolveOffCodes(repcodes_=
t* const dRepcodes, repcodes_
  * Returns the total size of that block (including header) or a ZSTD error=
 code.
  */
 static size_t
-ZSTD_compressSeqStore_singleBlock(ZSTD_CCtx* zc, seqStore_t* const seqStor=
e,
-                                  repcodes_t* const dRep, repcodes_t* cons=
t cRep,
+ZSTD_compressSeqStore_singleBlock(ZSTD_CCtx* zc,
+                            const SeqStore_t* const seqStore,
+                                  Repcodes_t* const dRep, Repcodes_t* cons=
t cRep,
                                   void* dst, size_t dstCapacity,
-                                  const void* src, size_t srcSize,
+                            const void* src, size_t srcSize,
                                   U32 lastBlock, U32 isPartition)
 {
     const U32 rleMaxLength =3D 25;
@@ -3417,7 +4030,7 @@ ZSTD_compressSeqStore_singleBlock(ZSTD_CCtx* zc, seqS=
tore_t* const seqStore,
     size_t cSeqsSize;
=20
     /* In case of an RLE or raw block, the simulated decompression repcode=
 history must be reset */
-    repcodes_t const dRepOriginal =3D *dRep;
+    Repcodes_t const dRepOriginal =3D *dRep;
     DEBUGLOG(5, "ZSTD_compressSeqStore_singleBlock");
     if (isPartition)
         ZSTD_seqStore_resolveOffCodes(dRep, cRep, seqStore, (U32)(seqStore=
->sequences - seqStore->sequencesStart));
@@ -3428,7 +4041,7 @@ ZSTD_compressSeqStore_singleBlock(ZSTD_CCtx* zc, seqS=
tore_t* const seqStore,
                 &zc->appliedParams,
                 op + ZSTD_blockHeaderSize, dstCapacity - ZSTD_blockHeaderS=
ize,
                 srcSize,
-                zc->entropyWorkspace, ENTROPY_WORKSPACE_SIZE /* statically=
 allocated in resetCCtx */,
+                zc->tmpWorkspace, zc->tmpWkspSize /* statically allocated =
in resetCCtx */,
                 zc->bmi2);
     FORWARD_IF_ERROR(cSeqsSize, "ZSTD_entropyCompressSeqStore failed!");
=20
@@ -3442,8 +4055,9 @@ ZSTD_compressSeqStore_singleBlock(ZSTD_CCtx* zc, seqS=
tore_t* const seqStore,
         cSeqsSize =3D 1;
     }
=20
+    /* Sequence collection not supported when block splitting */
     if (zc->seqCollector.collectSequences) {
-        ZSTD_copyBlockSequences(zc);
+        FORWARD_IF_ERROR(ZSTD_copyBlockSequences(&zc->seqCollector, seqSto=
re, dRepOriginal.rep), "copyBlockSequences failed");
         ZSTD_blockState_confirmRepcodesAndEntropyTables(&zc->blockState);
         return 0;
     }
@@ -3451,18 +4065,18 @@ ZSTD_compressSeqStore_singleBlock(ZSTD_CCtx* zc, se=
qStore_t* const seqStore,
     if (cSeqsSize =3D=3D 0) {
         cSize =3D ZSTD_noCompressBlock(op, dstCapacity, ip, srcSize, lastB=
lock);
         FORWARD_IF_ERROR(cSize, "Nocompress block failed");
-        DEBUGLOG(4, "Writing out nocompress block, size: %zu", cSize);
+        DEBUGLOG(5, "Writing out nocompress block, size: %zu", cSize);
         *dRep =3D dRepOriginal; /* reset simulated decompression repcode h=
istory */
     } else if (cSeqsSize =3D=3D 1) {
         cSize =3D ZSTD_rleCompressBlock(op, dstCapacity, *ip, srcSize, las=
tBlock);
         FORWARD_IF_ERROR(cSize, "RLE compress block failed");
-        DEBUGLOG(4, "Writing out RLE block, size: %zu", cSize);
+        DEBUGLOG(5, "Writing out RLE block, size: %zu", cSize);
         *dRep =3D dRepOriginal; /* reset simulated decompression repcode h=
istory */
     } else {
         ZSTD_blockState_confirmRepcodesAndEntropyTables(&zc->blockState);
         writeBlockHeader(op, cSeqsSize, srcSize, lastBlock);
         cSize =3D ZSTD_blockHeaderSize + cSeqsSize;
-        DEBUGLOG(4, "Writing out compressed block, size: %zu", cSize);
+        DEBUGLOG(5, "Writing out compressed block, size: %zu", cSize);
     }
=20
     if (zc->blockState.prevCBlock->entropy.fse.offcode_repeatMode =3D=3D F=
SE_repeat_valid)
@@ -3481,45 +4095,49 @@ typedef struct {
=20
 /* Helper function to perform the recursive search for block splits.
  * Estimates the cost of seqStore prior to split, and estimates the cost o=
f splitting the sequences in half.
- * If advantageous to split, then we recurse down the two sub-blocks. If n=
ot, or if an error occurred in estimation, then
- * we do not recurse.
+ * If advantageous to split, then we recurse down the two sub-blocks.
+ * If not, or if an error occurred in estimation, then we do not recurse.
  *
- * Note: The recursion depth is capped by a heuristic minimum number of se=
quences, defined by MIN_SEQUENCES_BLOCK_SPLITTING.
+ * Note: The recursion depth is capped by a heuristic minimum number of se=
quences,
+ * defined by MIN_SEQUENCES_BLOCK_SPLITTING.
  * In theory, this means the absolute largest recursion depth is 10 =3D=3D=
 log2(maxNbSeqInBlock/MIN_SEQUENCES_BLOCK_SPLITTING).
  * In practice, recursion depth usually doesn't go beyond 4.
  *
- * Furthermore, the number of splits is capped by ZSTD_MAX_NB_BLOCK_SPLITS=
. At ZSTD_MAX_NB_BLOCK_SPLITS =3D=3D 196 with the current existing blockSize
+ * Furthermore, the number of splits is capped by ZSTD_MAX_NB_BLOCK_SPLITS.
+ * At ZSTD_MAX_NB_BLOCK_SPLITS =3D=3D 196 with the current existing blockS=
ize
  * maximum of 128 KB, this value is actually impossible to reach.
  */
 static void
 ZSTD_deriveBlockSplitsHelper(seqStoreSplits* splits, size_t startIdx, size=
_t endIdx,
-                             ZSTD_CCtx* zc, const seqStore_t* origSeqStore)
+                             ZSTD_CCtx* zc, const SeqStore_t* origSeqStore)
 {
-    seqStore_t* fullSeqStoreChunk =3D &zc->blockSplitCtx.fullSeqStoreChunk;
-    seqStore_t* firstHalfSeqStore =3D &zc->blockSplitCtx.firstHalfSeqStore;
-    seqStore_t* secondHalfSeqStore =3D &zc->blockSplitCtx.secondHalfSeqSto=
re;
+    SeqStore_t* const fullSeqStoreChunk =3D &zc->blockSplitCtx.fullSeqStor=
eChunk;
+    SeqStore_t* const firstHalfSeqStore =3D &zc->blockSplitCtx.firstHalfSe=
qStore;
+    SeqStore_t* const secondHalfSeqStore =3D &zc->blockSplitCtx.secondHalf=
SeqStore;
     size_t estimatedOriginalSize;
     size_t estimatedFirstHalfSize;
     size_t estimatedSecondHalfSize;
     size_t midIdx =3D (startIdx + endIdx)/2;
=20
+    DEBUGLOG(5, "ZSTD_deriveBlockSplitsHelper: startIdx=3D%zu endIdx=3D%zu=
", startIdx, endIdx);
+    assert(endIdx >=3D startIdx);
     if (endIdx - startIdx < MIN_SEQUENCES_BLOCK_SPLITTING || splits->idx >=
=3D ZSTD_MAX_NB_BLOCK_SPLITS) {
-        DEBUGLOG(6, "ZSTD_deriveBlockSplitsHelper: Too few sequences");
+        DEBUGLOG(6, "ZSTD_deriveBlockSplitsHelper: Too few sequences (%zu)=
", endIdx - startIdx);
         return;
     }
-    DEBUGLOG(4, "ZSTD_deriveBlockSplitsHelper: startIdx=3D%zu endIdx=3D%zu=
", startIdx, endIdx);
     ZSTD_deriveSeqStoreChunk(fullSeqStoreChunk, origSeqStore, startIdx, en=
dIdx);
     ZSTD_deriveSeqStoreChunk(firstHalfSeqStore, origSeqStore, startIdx, mi=
dIdx);
     ZSTD_deriveSeqStoreChunk(secondHalfSeqStore, origSeqStore, midIdx, end=
Idx);
     estimatedOriginalSize =3D ZSTD_buildEntropyStatisticsAndEstimateSubBlo=
ckSize(fullSeqStoreChunk, zc);
     estimatedFirstHalfSize =3D ZSTD_buildEntropyStatisticsAndEstimateSubBl=
ockSize(firstHalfSeqStore, zc);
     estimatedSecondHalfSize =3D ZSTD_buildEntropyStatisticsAndEstimateSubB=
lockSize(secondHalfSeqStore, zc);
-    DEBUGLOG(4, "Estimated original block size: %zu -- First half split: %=
zu -- Second half split: %zu",
+    DEBUGLOG(5, "Estimated original block size: %zu -- First half split: %=
zu -- Second half split: %zu",
              estimatedOriginalSize, estimatedFirstHalfSize, estimatedSecon=
dHalfSize);
     if (ZSTD_isError(estimatedOriginalSize) || ZSTD_isError(estimatedFirst=
HalfSize) || ZSTD_isError(estimatedSecondHalfSize)) {
         return;
     }
     if (estimatedFirstHalfSize + estimatedSecondHalfSize < estimatedOrigin=
alSize) {
+        DEBUGLOG(5, "split decided at seqNb:%zu", midIdx);
         ZSTD_deriveBlockSplitsHelper(splits, startIdx, midIdx, zc, origSeq=
Store);
         splits->splitLocations[splits->idx] =3D (U32)midIdx;
         splits->idx++;
@@ -3527,14 +4145,18 @@ ZSTD_deriveBlockSplitsHelper(seqStoreSplits* splits=
, size_t startIdx, size_t end
     }
 }
=20
-/* Base recursive function. Populates a table with intra-block partition i=
ndices that can improve compression ratio.
+/* Base recursive function.
+ * Populates a table with intra-block partition indices that can improve c=
ompression ratio.
  *
- * Returns the number of splits made (which equals the size of the partiti=
on table - 1).
+ * @return: number of splits made (which equals the size of the partition =
table - 1).
  */
-static size_t ZSTD_deriveBlockSplits(ZSTD_CCtx* zc, U32 partitions[], U32 =
nbSeq) {
-    seqStoreSplits splits =3D {partitions, 0};
+static size_t ZSTD_deriveBlockSplits(ZSTD_CCtx* zc, U32 partitions[], U32 =
nbSeq)
+{
+    seqStoreSplits splits;
+    splits.splitLocations =3D partitions;
+    splits.idx =3D 0;
     if (nbSeq <=3D 4) {
-        DEBUGLOG(4, "ZSTD_deriveBlockSplits: Too few sequences to split");
+        DEBUGLOG(5, "ZSTD_deriveBlockSplits: Too few sequences to split (%=
u <=3D 4)", nbSeq);
         /* Refuse to try and split anything with less than 4 sequences */
         return 0;
     }
@@ -3550,18 +4172,20 @@ static size_t ZSTD_deriveBlockSplits(ZSTD_CCtx* zc,=
 U32 partitions[], U32 nbSeq)
  * Returns combined size of all blocks (which includes headers), or a ZSTD=
 error code.
  */
 static size_t
-ZSTD_compressBlock_splitBlock_internal(ZSTD_CCtx* zc, void* dst, size_t ds=
tCapacity,
-                                       const void* src, size_t blockSize, =
U32 lastBlock, U32 nbSeq)
+ZSTD_compressBlock_splitBlock_internal(ZSTD_CCtx* zc,
+                                    void* dst, size_t dstCapacity,
+                              const void* src, size_t blockSize,
+                                    U32 lastBlock, U32 nbSeq)
 {
     size_t cSize =3D 0;
     const BYTE* ip =3D (const BYTE*)src;
     BYTE* op =3D (BYTE*)dst;
     size_t i =3D 0;
     size_t srcBytesTotal =3D 0;
-    U32* partitions =3D zc->blockSplitCtx.partitions; /* size =3D=3D ZSTD_=
MAX_NB_BLOCK_SPLITS */
-    seqStore_t* nextSeqStore =3D &zc->blockSplitCtx.nextSeqStore;
-    seqStore_t* currSeqStore =3D &zc->blockSplitCtx.currSeqStore;
-    size_t numSplits =3D ZSTD_deriveBlockSplits(zc, partitions, nbSeq);
+    U32* const partitions =3D zc->blockSplitCtx.partitions; /* size =3D=3D=
 ZSTD_MAX_NB_BLOCK_SPLITS */
+    SeqStore_t* const nextSeqStore =3D &zc->blockSplitCtx.nextSeqStore;
+    SeqStore_t* const currSeqStore =3D &zc->blockSplitCtx.currSeqStore;
+    size_t const numSplits =3D ZSTD_deriveBlockSplits(zc, partitions, nbSe=
q);
=20
     /* If a block is split and some partitions are emitted as RLE/uncompre=
ssed, then repcode history
      * may become invalid. In order to reconcile potentially invalid repco=
des, we keep track of two
@@ -3577,36 +4201,37 @@ ZSTD_compressBlock_splitBlock_internal(ZSTD_CCtx* z=
c, void* dst, size_t dstCapac
      *
      * See ZSTD_seqStore_resolveOffCodes() for more details.
      */
-    repcodes_t dRep;
-    repcodes_t cRep;
-    ZSTD_memcpy(dRep.rep, zc->blockState.prevCBlock->rep, sizeof(repcodes_=
t));
-    ZSTD_memcpy(cRep.rep, zc->blockState.prevCBlock->rep, sizeof(repcodes_=
t));
-    ZSTD_memset(nextSeqStore, 0, sizeof(seqStore_t));
+    Repcodes_t dRep;
+    Repcodes_t cRep;
+    ZSTD_memcpy(dRep.rep, zc->blockState.prevCBlock->rep, sizeof(Repcodes_=
t));
+    ZSTD_memcpy(cRep.rep, zc->blockState.prevCBlock->rep, sizeof(Repcodes_=
t));
+    ZSTD_memset(nextSeqStore, 0, sizeof(SeqStore_t));
=20
-    DEBUGLOG(4, "ZSTD_compressBlock_splitBlock_internal (dstCapacity=3D%u,=
 dictLimit=3D%u, nextToUpdate=3D%u)",
+    DEBUGLOG(5, "ZSTD_compressBlock_splitBlock_internal (dstCapacity=3D%u,=
 dictLimit=3D%u, nextToUpdate=3D%u)",
                 (unsigned)dstCapacity, (unsigned)zc->blockState.matchState=
.window.dictLimit,
                 (unsigned)zc->blockState.matchState.nextToUpdate);
=20
     if (numSplits =3D=3D 0) {
-        size_t cSizeSingleBlock =3D ZSTD_compressSeqStore_singleBlock(zc, =
&zc->seqStore,
-                                                                   &dRep, =
&cRep,
-                                                                    op, ds=
tCapacity,
-                                                                    ip, bl=
ockSize,
-                                                                    lastBl=
ock, 0 /* isPartition */);
+        size_t cSizeSingleBlock =3D
+            ZSTD_compressSeqStore_singleBlock(zc, &zc->seqStore,
+                                            &dRep, &cRep,
+                                            op, dstCapacity,
+                                            ip, blockSize,
+                                            lastBlock, 0 /* isPartition */=
);
         FORWARD_IF_ERROR(cSizeSingleBlock, "Compressing single block from =
splitBlock_internal() failed!");
         DEBUGLOG(5, "ZSTD_compressBlock_splitBlock_internal: No splits");
-        assert(cSizeSingleBlock <=3D ZSTD_BLOCKSIZE_MAX + ZSTD_blockHeader=
Size);
+        assert(zc->blockSizeMax <=3D ZSTD_BLOCKSIZE_MAX);
+        assert(cSizeSingleBlock <=3D zc->blockSizeMax + ZSTD_blockHeaderSi=
ze);
         return cSizeSingleBlock;
     }
=20
     ZSTD_deriveSeqStoreChunk(currSeqStore, &zc->seqStore, 0, partitions[0]=
);
     for (i =3D 0; i <=3D numSplits; ++i) {
-        size_t srcBytes;
         size_t cSizeChunk;
         U32 const lastPartition =3D (i =3D=3D numSplits);
         U32 lastBlockEntireSrc =3D 0;
=20
-        srcBytes =3D ZSTD_countSeqStoreLiteralsBytes(currSeqStore) + ZSTD_=
countSeqStoreMatchBytes(currSeqStore);
+        size_t srcBytes =3D ZSTD_countSeqStoreLiteralsBytes(currSeqStore) =
+ ZSTD_countSeqStoreMatchBytes(currSeqStore);
         srcBytesTotal +=3D srcBytes;
         if (lastPartition) {
             /* This is the final partition, need to account for possible l=
ast literals */
@@ -3621,7 +4246,8 @@ ZSTD_compressBlock_splitBlock_internal(ZSTD_CCtx* zc,=
 void* dst, size_t dstCapac
                                                        op, dstCapacity,
                                                        ip, srcBytes,
                                                        lastBlockEntireSrc,=
 1 /* isPartition */);
-        DEBUGLOG(5, "Estimated size: %zu actual size: %zu", ZSTD_buildEntr=
opyStatisticsAndEstimateSubBlockSize(currSeqStore, zc), cSizeChunk);
+        DEBUGLOG(5, "Estimated size: %zu vs %zu : actual size",
+                    ZSTD_buildEntropyStatisticsAndEstimateSubBlockSize(cur=
rSeqStore, zc), cSizeChunk);
         FORWARD_IF_ERROR(cSizeChunk, "Compressing chunk failed!");
=20
         ip +=3D srcBytes;
@@ -3629,12 +4255,12 @@ ZSTD_compressBlock_splitBlock_internal(ZSTD_CCtx* z=
c, void* dst, size_t dstCapac
         dstCapacity -=3D cSizeChunk;
         cSize +=3D cSizeChunk;
         *currSeqStore =3D *nextSeqStore;
-        assert(cSizeChunk <=3D ZSTD_BLOCKSIZE_MAX + ZSTD_blockHeaderSize);
+        assert(cSizeChunk <=3D zc->blockSizeMax + ZSTD_blockHeaderSize);
     }
-    /* cRep and dRep may have diverged during the compression. If so, we u=
se the dRep repcodes
-     * for the next block.
+    /* cRep and dRep may have diverged during the compression.
+     * If so, we use the dRep repcodes for the next block.
      */
-    ZSTD_memcpy(zc->blockState.prevCBlock->rep, dRep.rep, sizeof(repcodes_=
t));
+    ZSTD_memcpy(zc->blockState.prevCBlock->rep, dRep.rep, sizeof(Repcodes_=
t));
     return cSize;
 }
=20
@@ -3643,21 +4269,20 @@ ZSTD_compressBlock_splitBlock(ZSTD_CCtx* zc,
                               void* dst, size_t dstCapacity,
                               const void* src, size_t srcSize, U32 lastBlo=
ck)
 {
-    const BYTE* ip =3D (const BYTE*)src;
-    BYTE* op =3D (BYTE*)dst;
     U32 nbSeq;
     size_t cSize;
-    DEBUGLOG(4, "ZSTD_compressBlock_splitBlock");
-    assert(zc->appliedParams.useBlockSplitter =3D=3D ZSTD_ps_enable);
+    DEBUGLOG(5, "ZSTD_compressBlock_splitBlock");
+    assert(zc->appliedParams.postBlockSplitter =3D=3D ZSTD_ps_enable);
=20
     {   const size_t bss =3D ZSTD_buildSeqStore(zc, src, srcSize);
         FORWARD_IF_ERROR(bss, "ZSTD_buildSeqStore failed");
         if (bss =3D=3D ZSTDbss_noCompress) {
             if (zc->blockState.prevCBlock->entropy.fse.offcode_repeatMode =
=3D=3D FSE_repeat_valid)
                 zc->blockState.prevCBlock->entropy.fse.offcode_repeatMode =
=3D FSE_repeat_check;
-            cSize =3D ZSTD_noCompressBlock(op, dstCapacity, ip, srcSize, l=
astBlock);
+            RETURN_ERROR_IF(zc->seqCollector.collectSequences, sequencePro=
ducer_failed, "Uncompressible block");
+            cSize =3D ZSTD_noCompressBlock(dst, dstCapacity, src, srcSize,=
 lastBlock);
             FORWARD_IF_ERROR(cSize, "ZSTD_noCompressBlock failed");
-            DEBUGLOG(4, "ZSTD_compressBlock_splitBlock: Nocompress block");
+            DEBUGLOG(5, "ZSTD_compressBlock_splitBlock: Nocompress block");
             return cSize;
         }
         nbSeq =3D (U32)(zc->seqStore.sequences - zc->seqStore.sequencesSta=
rt);
@@ -3673,9 +4298,9 @@ ZSTD_compressBlock_internal(ZSTD_CCtx* zc,
                             void* dst, size_t dstCapacity,
                             const void* src, size_t srcSize, U32 frame)
 {
-    /* This the upper bound for the length of an rle block.
-     * This isn't the actual upper bound. Finding the real threshold
-     * needs further investigation.
+    /* This is an estimated upper bound for the length of an rle block.
+     * This isn't the actual upper bound.
+     * Finding the real threshold needs further investigation.
      */
     const U32 rleMaxLength =3D 25;
     size_t cSize;
@@ -3687,11 +4312,15 @@ ZSTD_compressBlock_internal(ZSTD_CCtx* zc,
=20
     {   const size_t bss =3D ZSTD_buildSeqStore(zc, src, srcSize);
         FORWARD_IF_ERROR(bss, "ZSTD_buildSeqStore failed");
-        if (bss =3D=3D ZSTDbss_noCompress) { cSize =3D 0; goto out; }
+        if (bss =3D=3D ZSTDbss_noCompress) {
+            RETURN_ERROR_IF(zc->seqCollector.collectSequences, sequencePro=
ducer_failed, "Uncompressible block");
+            cSize =3D 0;
+            goto out;
+        }
     }
=20
     if (zc->seqCollector.collectSequences) {
-        ZSTD_copyBlockSequences(zc);
+        FORWARD_IF_ERROR(ZSTD_copyBlockSequences(&zc->seqCollector, ZSTD_g=
etSeqStore(zc), zc->blockState.prevCBlock->rep), "copyBlockSequences failed=
");
         ZSTD_blockState_confirmRepcodesAndEntropyTables(&zc->blockState);
         return 0;
     }
@@ -3702,7 +4331,7 @@ ZSTD_compressBlock_internal(ZSTD_CCtx* zc,
             &zc->appliedParams,
             dst, dstCapacity,
             srcSize,
-            zc->entropyWorkspace, ENTROPY_WORKSPACE_SIZE /* statically all=
ocated in resetCCtx */,
+            zc->tmpWorkspace, zc->tmpWkspSize /* statically allocated in r=
esetCCtx */,
             zc->bmi2);
=20
     if (frame &&
@@ -3767,10 +4396,11 @@ static size_t ZSTD_compressBlock_targetCBlockSize_b=
ody(ZSTD_CCtx* zc,
          *   * cSize >=3D blockBound(srcSize): We have expanded the block =
too much so
          *     emit an uncompressed block.
          */
-        {
-            size_t const cSize =3D ZSTD_compressSuperBlock(zc, dst, dstCap=
acity, src, srcSize, lastBlock);
+        {   size_t const cSize =3D
+                ZSTD_compressSuperBlock(zc, dst, dstCapacity, src, srcSize=
, lastBlock);
             if (cSize !=3D ERROR(dstSize_tooSmall)) {
-                size_t const maxCSize =3D srcSize - ZSTD_minGain(srcSize, =
zc->appliedParams.cParams.strategy);
+                size_t const maxCSize =3D
+                    srcSize - ZSTD_minGain(srcSize, zc->appliedParams.cPar=
ams.strategy);
                 FORWARD_IF_ERROR(cSize, "ZSTD_compressSuperBlock failed");
                 if (cSize !=3D 0 && cSize < maxCSize + ZSTD_blockHeaderSiz=
e) {
                     ZSTD_blockState_confirmRepcodesAndEntropyTables(&zc->b=
lockState);
@@ -3778,7 +4408,7 @@ static size_t ZSTD_compressBlock_targetCBlockSize_bod=
y(ZSTD_CCtx* zc,
                 }
             }
         }
-    }
+    } /* if (bss =3D=3D ZSTDbss_compress)*/
=20
     DEBUGLOG(6, "Resorting to ZSTD_noCompressBlock()");
     /* Superblock compression failed, attempt to emit a single no compress=
 block.
@@ -3807,7 +4437,7 @@ static size_t ZSTD_compressBlock_targetCBlockSize(ZST=
D_CCtx* zc,
     return cSize;
 }
=20
-static void ZSTD_overflowCorrectIfNeeded(ZSTD_matchState_t* ms,
+static void ZSTD_overflowCorrectIfNeeded(ZSTD_MatchState_t* ms,
                                          ZSTD_cwksp* ws,
                                          ZSTD_CCtx_params const* params,
                                          void const* ip,
@@ -3831,39 +4461,82 @@ static void ZSTD_overflowCorrectIfNeeded(ZSTD_match=
State_t* ms,
     }
 }
=20
+#include "zstd_preSplit.h"
+
+static size_t ZSTD_optimalBlockSize(ZSTD_CCtx* cctx, const void* src, size=
_t srcSize, size_t blockSizeMax, int splitLevel, ZSTD_strategy strat, S64 s=
avings)
+{
+    /* split level based on compression strategy, from `fast` to `btultra2=
` */
+    static const int splitLevels[] =3D { 0, 0, 1, 2, 2, 3, 3, 4, 4, 4 };
+    /* note: conservatively only split full blocks (128 KB) currently.
+     * While it's possible to go lower, let's keep it simple for a first i=
mplementation.
+     * Besides, benefits of splitting are reduced when blocks are already =
small.
+     */
+    if (srcSize < 128 KB || blockSizeMax < 128 KB)
+        return MIN(srcSize, blockSizeMax);
+    /* do not split incompressible data though:
+     * require verified savings to allow pre-splitting.
+     * Note: as a consequence, the first full block is not split.
+     */
+    if (savings < 3) {
+        DEBUGLOG(6, "don't attempt splitting: savings (%i) too low", (int)=
savings);
+        return 128 KB;
+    }
+    /* apply @splitLevel, or use default value (which depends on @strat).
+     * note that splitting heuristic is still conditioned by @savings >=3D=
 3,
+     * so the first block will not reach this code path */
+    if (splitLevel =3D=3D 1) return 128 KB;
+    if (splitLevel =3D=3D 0) {
+        assert(ZSTD_fast <=3D strat && strat <=3D ZSTD_btultra2);
+        splitLevel =3D splitLevels[strat];
+    } else {
+        assert(2 <=3D splitLevel && splitLevel <=3D 6);
+        splitLevel -=3D 2;
+    }
+    return ZSTD_splitBlock(src, blockSizeMax, splitLevel, cctx->tmpWorkspa=
ce, cctx->tmpWkspSize);
+}
+
 /*! ZSTD_compress_frameChunk() :
 *   Compress a chunk of data into one or multiple blocks.
 *   All blocks will be terminated, all input will be consumed.
 *   Function will issue an error if there is not enough `dstCapacity` to h=
old the compressed content.
 *   Frame is supposed already started (header already produced)
-*   @return : compressed size, or an error code
+*  @return : compressed size, or an error code
 */
 static size_t ZSTD_compress_frameChunk(ZSTD_CCtx* cctx,
                                      void* dst, size_t dstCapacity,
                                const void* src, size_t srcSize,
                                      U32 lastFrameChunk)
 {
-    size_t blockSize =3D cctx->blockSize;
+    size_t blockSizeMax =3D cctx->blockSizeMax;
     size_t remaining =3D srcSize;
     const BYTE* ip =3D (const BYTE*)src;
     BYTE* const ostart =3D (BYTE*)dst;
     BYTE* op =3D ostart;
     U32 const maxDist =3D (U32)1 << cctx->appliedParams.cParams.windowLog;
+    S64 savings =3D (S64)cctx->consumedSrcSize - (S64)cctx->producedCSize;
=20
     assert(cctx->appliedParams.cParams.windowLog <=3D ZSTD_WINDOWLOG_MAX);
=20
-    DEBUGLOG(4, "ZSTD_compress_frameChunk (blockSize=3D%u)", (unsigned)blo=
ckSize);
+    DEBUGLOG(5, "ZSTD_compress_frameChunk (srcSize=3D%u, blockSizeMax=3D%u=
)", (unsigned)srcSize, (unsigned)blockSizeMax);
     if (cctx->appliedParams.fParams.checksumFlag && srcSize)
         xxh64_update(&cctx->xxhState, src, srcSize);
=20
     while (remaining) {
-        ZSTD_matchState_t* const ms =3D &cctx->blockState.matchState;
-        U32 const lastBlock =3D lastFrameChunk & (blockSize >=3D remaining=
);
-
-        RETURN_ERROR_IF(dstCapacity < ZSTD_blockHeaderSize + MIN_CBLOCK_SI=
ZE,
+        ZSTD_MatchState_t* const ms =3D &cctx->blockState.matchState;
+        size_t const blockSize =3D ZSTD_optimalBlockSize(cctx,
+                                ip, remaining,
+                                blockSizeMax,
+                                cctx->appliedParams.preBlockSplitter_level,
+                                cctx->appliedParams.cParams.strategy,
+                                savings);
+        U32 const lastBlock =3D lastFrameChunk & (blockSize =3D=3D remaini=
ng);
+        assert(blockSize <=3D remaining);
+
+        /* TODO: See 3090. We reduced MIN_CBLOCK_SIZE from 3 to 2 so to co=
mpensate we are adding
+         * additional 1. We need to revisit and change this logic to be mo=
re consistent */
+        RETURN_ERROR_IF(dstCapacity < ZSTD_blockHeaderSize + MIN_CBLOCK_SI=
ZE + 1,
                         dstSize_tooSmall,
                         "not enough space to store compressed block");
-        if (remaining < blockSize) blockSize =3D remaining;
=20
         ZSTD_overflowCorrectIfNeeded(
             ms, &cctx->workspace, &cctx->appliedParams, ip, ip + blockSize=
);
@@ -3899,8 +4572,23 @@ static size_t ZSTD_compress_frameChunk(ZSTD_CCtx* cc=
tx,
                     MEM_writeLE24(op, cBlockHeader);
                     cSize +=3D ZSTD_blockHeaderSize;
                 }
-            }
-
+            }  /* if (ZSTD_useTargetCBlockSize(&cctx->appliedParams))*/
+
+            /* @savings is employed to ensure that splitting doesn't worse=
n expansion of incompressible data.
+             * Without splitting, the maximum expansion is 3 bytes per ful=
l block.
+             * An adversarial input could attempt to fudge the split detec=
tor,
+             * and make it split incompressible data, resulting in more bl=
ock headers.
+             * Note that, since ZSTD_COMPRESSBOUND() assumes a worst case =
scenario of 1KB per block,
+             * and the splitter never creates blocks that small (current l=
ower limit is 8 KB),
+             * there is already no risk to expand beyond ZSTD_COMPRESSBOUN=
D() limit.
+             * But if the goal is to not expand by more than 3-bytes per 1=
28 KB full block,
+             * then yes, it becomes possible to make the block splitter ov=
ersplit incompressible data.
+             * Using @savings, we enforce an even more conservative condit=
ion,
+             * requiring the presence of enough savings (at least 3 bytes)=
 to authorize splitting,
+             * otherwise only full blocks are used.
+             * But being conservative is fine,
+             * since splitting barely compressible blocks is not fruitful =
anyway */
+            savings +=3D (S64)blockSize - (S64)cSize;
=20
             ip +=3D blockSize;
             assert(remaining >=3D blockSize);
@@ -3919,8 +4607,10 @@ static size_t ZSTD_compress_frameChunk(ZSTD_CCtx* cc=
tx,
=20
=20
 static size_t ZSTD_writeFrameHeader(void* dst, size_t dstCapacity,
-                                    const ZSTD_CCtx_params* params, U64 pl=
edgedSrcSize, U32 dictID)
-{   BYTE* const op =3D (BYTE*)dst;
+                                    const ZSTD_CCtx_params* params,
+                                    U64 pledgedSrcSize, U32 dictID)
+{
+    BYTE* const op =3D (BYTE*)dst;
     U32   const dictIDSizeCodeLength =3D (dictID>0) + (dictID>=3D256) + (d=
ictID>=3D65536);   /* 0-3 */
     U32   const dictIDSizeCode =3D params->fParams.noDictIDFlag ? 0 : dict=
IDSizeCodeLength;   /* 0-3 */
     U32   const checksumFlag =3D params->fParams.checksumFlag>0;
@@ -4001,19 +4691,15 @@ size_t ZSTD_writeLastEmptyBlock(void* dst, size_t d=
stCapacity)
     }
 }
=20
-size_t ZSTD_referenceExternalSequences(ZSTD_CCtx* cctx, rawSeq* seq, size_=
t nbSeq)
+void ZSTD_referenceExternalSequences(ZSTD_CCtx* cctx, rawSeq* seq, size_t =
nbSeq)
 {
-    RETURN_ERROR_IF(cctx->stage !=3D ZSTDcs_init, stage_wrong,
-                    "wrong cctx stage");
-    RETURN_ERROR_IF(cctx->appliedParams.ldmParams.enableLdm =3D=3D ZSTD_ps=
_enable,
-                    parameter_unsupported,
-                    "incompatible with ldm");
+    assert(cctx->stage =3D=3D ZSTDcs_init);
+    assert(nbSeq =3D=3D 0 || cctx->appliedParams.ldmParams.enableLdm !=3D =
ZSTD_ps_enable);
     cctx->externSeqStore.seq =3D seq;
     cctx->externSeqStore.size =3D nbSeq;
     cctx->externSeqStore.capacity =3D nbSeq;
     cctx->externSeqStore.pos =3D 0;
     cctx->externSeqStore.posInSequence =3D 0;
-    return 0;
 }
=20
=20
@@ -4022,7 +4708,7 @@ static size_t ZSTD_compressContinue_internal (ZSTD_CC=
tx* cctx,
                         const void* src, size_t srcSize,
                                U32 frame, U32 lastFrameChunk)
 {
-    ZSTD_matchState_t* const ms =3D &cctx->blockState.matchState;
+    ZSTD_MatchState_t* const ms =3D &cctx->blockState.matchState;
     size_t fhSize =3D 0;
=20
     DEBUGLOG(5, "ZSTD_compressContinue_internal, stage: %u, srcSize: %u",
@@ -4057,7 +4743,7 @@ static size_t ZSTD_compressContinue_internal (ZSTD_CC=
tx* cctx,
             src, (BYTE const*)src + srcSize);
     }
=20
-    DEBUGLOG(5, "ZSTD_compressContinue_internal (blockSize=3D%u)", (unsign=
ed)cctx->blockSize);
+    DEBUGLOG(5, "ZSTD_compressContinue_internal (blockSize=3D%u)", (unsign=
ed)cctx->blockSizeMax);
     {   size_t const cSize =3D frame ?
                              ZSTD_compress_frameChunk (cctx, dst, dstCapac=
ity, src, srcSize, lastFrameChunk) :
                              ZSTD_compressBlock_internal (cctx, dst, dstCa=
pacity, src, srcSize, 0 /* frame */);
@@ -4078,58 +4764,90 @@ static size_t ZSTD_compressContinue_internal (ZSTD_=
CCtx* cctx,
     }
 }
=20
-size_t ZSTD_compressContinue (ZSTD_CCtx* cctx,
-                              void* dst, size_t dstCapacity,
-                        const void* src, size_t srcSize)
+size_t ZSTD_compressContinue_public(ZSTD_CCtx* cctx,
+                                        void* dst, size_t dstCapacity,
+                                  const void* src, size_t srcSize)
 {
     DEBUGLOG(5, "ZSTD_compressContinue (srcSize=3D%u)", (unsigned)srcSize);
     return ZSTD_compressContinue_internal(cctx, dst, dstCapacity, src, src=
Size, 1 /* frame mode */, 0 /* last chunk */);
 }
=20
+/* NOTE: Must just wrap ZSTD_compressContinue_public() */
+size_t ZSTD_compressContinue(ZSTD_CCtx* cctx,
+                             void* dst, size_t dstCapacity,
+                       const void* src, size_t srcSize)
+{
+    return ZSTD_compressContinue_public(cctx, dst, dstCapacity, src, srcSi=
ze);
+}
=20
-size_t ZSTD_getBlockSize(const ZSTD_CCtx* cctx)
+static size_t ZSTD_getBlockSize_deprecated(const ZSTD_CCtx* cctx)
 {
     ZSTD_compressionParameters const cParams =3D cctx->appliedParams.cPara=
ms;
     assert(!ZSTD_checkCParams(cParams));
-    return MIN (ZSTD_BLOCKSIZE_MAX, (U32)1 << cParams.windowLog);
+    return MIN(cctx->appliedParams.maxBlockSize, (size_t)1 << cParams.wind=
owLog);
 }
=20
-size_t ZSTD_compressBlock(ZSTD_CCtx* cctx, void* dst, size_t dstCapacity, =
const void* src, size_t srcSize)
+/* NOTE: Must just wrap ZSTD_getBlockSize_deprecated() */
+size_t ZSTD_getBlockSize(const ZSTD_CCtx* cctx)
+{
+    return ZSTD_getBlockSize_deprecated(cctx);
+}
+
+/* NOTE: Must just wrap ZSTD_compressBlock_deprecated() */
+size_t ZSTD_compressBlock_deprecated(ZSTD_CCtx* cctx, void* dst, size_t ds=
tCapacity, const void* src, size_t srcSize)
 {
     DEBUGLOG(5, "ZSTD_compressBlock: srcSize =3D %u", (unsigned)srcSize);
-    { size_t const blockSizeMax =3D ZSTD_getBlockSize(cctx);
+    { size_t const blockSizeMax =3D ZSTD_getBlockSize_deprecated(cctx);
       RETURN_ERROR_IF(srcSize > blockSizeMax, srcSize_wrong, "input is lar=
ger than a block"); }
=20
     return ZSTD_compressContinue_internal(cctx, dst, dstCapacity, src, src=
Size, 0 /* frame mode */, 0 /* last chunk */);
 }
=20
+/* NOTE: Must just wrap ZSTD_compressBlock_deprecated() */
+size_t ZSTD_compressBlock(ZSTD_CCtx* cctx, void* dst, size_t dstCapacity, =
const void* src, size_t srcSize)
+{
+    return ZSTD_compressBlock_deprecated(cctx, dst, dstCapacity, src, srcS=
ize);
+}
+
 /*! ZSTD_loadDictionaryContent() :
  *  @return : 0, or an error code
  */
-static size_t ZSTD_loadDictionaryContent(ZSTD_matchState_t* ms,
-                                         ldmState_t* ls,
-                                         ZSTD_cwksp* ws,
-                                         ZSTD_CCtx_params const* params,
-                                         const void* src, size_t srcSize,
-                                         ZSTD_dictTableLoadMethod_e dtlm)
+static size_t
+ZSTD_loadDictionaryContent(ZSTD_MatchState_t* ms,
+                        ldmState_t* ls,
+                        ZSTD_cwksp* ws,
+                        ZSTD_CCtx_params const* params,
+                        const void* src, size_t srcSize,
+                        ZSTD_dictTableLoadMethod_e dtlm,
+                        ZSTD_tableFillPurpose_e tfp)
 {
     const BYTE* ip =3D (const BYTE*) src;
     const BYTE* const iend =3D ip + srcSize;
     int const loadLdmDict =3D params->ldmParams.enableLdm =3D=3D ZSTD_ps_e=
nable && ls !=3D NULL;
=20
-    /* Assert that we the ms params match the params we're being given */
+    /* Assert that the ms params match the params we're being given */
     ZSTD_assertEqualCParams(params->cParams, ms->cParams);
=20
-    if (srcSize > ZSTD_CHUNKSIZE_MAX) {
+    {   /* Ensure large dictionaries can't cause index overflow */
+
         /* Allow the dictionary to set indices up to exactly ZSTD_CURRENT_=
MAX.
          * Dictionaries right at the edge will immediately trigger overflow
          * correction, but I don't want to insert extra constraints here.
          */
-        U32 const maxDictSize =3D ZSTD_CURRENT_MAX - 1;
-        /* We must have cleared our windows when our source is this large.=
 */
-        assert(ZSTD_window_isEmpty(ms->window));
-        if (loadLdmDict)
-            assert(ZSTD_window_isEmpty(ls->window));
+        U32 maxDictSize =3D ZSTD_CURRENT_MAX - ZSTD_WINDOW_START_INDEX;
+
+        int const CDictTaggedIndices =3D ZSTD_CDictIndicesAreTagged(&param=
s->cParams);
+        if (CDictTaggedIndices && tfp =3D=3D ZSTD_tfp_forCDict) {
+            /* Some dictionary matchfinders in zstd use "short cache",
+             * which treats the lower ZSTD_SHORT_CACHE_TAG_BITS of each
+             * CDict hashtable entry as a tag rather than as part of an in=
dex.
+             * When short cache is used, we need to truncate the dictionary
+             * so that its indices don't overlap with the tag. */
+            U32 const shortCacheMaxDictSize =3D (1u << (32 - ZSTD_SHORT_CA=
CHE_TAG_BITS)) - ZSTD_WINDOW_START_INDEX;
+            maxDictSize =3D MIN(maxDictSize, shortCacheMaxDictSize);
+            assert(!loadLdmDict);
+        }
+
         /* If the dictionary is too large, only load the suffix of the dic=
tionary. */
         if (srcSize > maxDictSize) {
             ip =3D iend - maxDictSize;
@@ -4138,35 +4856,59 @@ static size_t ZSTD_loadDictionaryContent(ZSTD_match=
State_t* ms,
         }
     }
=20
-    DEBUGLOG(4, "ZSTD_loadDictionaryContent(): useRowMatchFinder=3D%d", (i=
nt)params->useRowMatchFinder);
+    if (srcSize > ZSTD_CHUNKSIZE_MAX) {
+        /* We must have cleared our windows when our source is this large.=
 */
+        assert(ZSTD_window_isEmpty(ms->window));
+        if (loadLdmDict) assert(ZSTD_window_isEmpty(ls->window));
+    }
     ZSTD_window_update(&ms->window, src, srcSize, /* forceNonContiguous */=
 0);
-    ms->loadedDictEnd =3D params->forceWindow ? 0 : (U32)(iend - ms->windo=
w.base);
-    ms->forceNonContiguous =3D params->deterministicRefPrefix;
=20
-    if (loadLdmDict) {
+    DEBUGLOG(4, "ZSTD_loadDictionaryContent: useRowMatchFinder=3D%d", (int=
)params->useRowMatchFinder);
+
+    if (loadLdmDict) { /* Load the entire dict into LDM matchfinders. */
+        DEBUGLOG(4, "ZSTD_loadDictionaryContent: Trigger loadLdmDict");
         ZSTD_window_update(&ls->window, src, srcSize, /* forceNonContiguou=
s */ 0);
         ls->loadedDictEnd =3D params->forceWindow ? 0 : (U32)(iend - ls->w=
indow.base);
+        ZSTD_ldm_fillHashTable(ls, ip, iend, &params->ldmParams);
+        DEBUGLOG(4, "ZSTD_loadDictionaryContent: ZSTD_ldm_fillHashTable co=
mpletes");
+    }
+
+    /* If the dict is larger than we can reasonably index in our tables, o=
nly load the suffix. */
+    {   U32 maxDictSize =3D 1U << MIN(MAX(params->cParams.hashLog + 3, par=
ams->cParams.chainLog + 1), 31);
+        if (srcSize > maxDictSize) {
+            ip =3D iend - maxDictSize;
+            src =3D ip;
+            srcSize =3D maxDictSize;
+        }
     }
=20
+    ms->nextToUpdate =3D (U32)(ip - ms->window.base);
+    ms->loadedDictEnd =3D params->forceWindow ? 0 : (U32)(iend - ms->windo=
w.base);
+    ms->forceNonContiguous =3D params->deterministicRefPrefix;
+
     if (srcSize <=3D HASH_READ_SIZE) return 0;
=20
     ZSTD_overflowCorrectIfNeeded(ms, ws, params, ip, iend);
=20
-    if (loadLdmDict)
-        ZSTD_ldm_fillHashTable(ls, ip, iend, &params->ldmParams);
-
     switch(params->cParams.strategy)
     {
     case ZSTD_fast:
-        ZSTD_fillHashTable(ms, iend, dtlm);
+        ZSTD_fillHashTable(ms, iend, dtlm, tfp);
         break;
     case ZSTD_dfast:
-        ZSTD_fillDoubleHashTable(ms, iend, dtlm);
+#ifndef ZSTD_EXCLUDE_DFAST_BLOCK_COMPRESSOR
+        ZSTD_fillDoubleHashTable(ms, iend, dtlm, tfp);
+#else
+        assert(0); /* shouldn't be called: cparams should've been adjusted=
. */
+#endif
         break;
=20
     case ZSTD_greedy:
     case ZSTD_lazy:
     case ZSTD_lazy2:
+#if !defined(ZSTD_EXCLUDE_GREEDY_BLOCK_COMPRESSOR) \
+ || !defined(ZSTD_EXCLUDE_LAZY_BLOCK_COMPRESSOR) \
+ || !defined(ZSTD_EXCLUDE_LAZY2_BLOCK_COMPRESSOR)
         assert(srcSize >=3D HASH_READ_SIZE);
         if (ms->dedicatedDictSearch) {
             assert(ms->chainTable !=3D NULL);
@@ -4174,7 +4916,7 @@ static size_t ZSTD_loadDictionaryContent(ZSTD_matchSt=
ate_t* ms,
         } else {
             assert(params->useRowMatchFinder !=3D ZSTD_ps_auto);
             if (params->useRowMatchFinder =3D=3D ZSTD_ps_enable) {
-                size_t const tagTableSize =3D ((size_t)1 << params->cParam=
s.hashLog) * sizeof(U16);
+                size_t const tagTableSize =3D ((size_t)1 << params->cParam=
s.hashLog);
                 ZSTD_memset(ms->tagTable, 0, tagTableSize);
                 ZSTD_row_update(ms, iend-HASH_READ_SIZE);
                 DEBUGLOG(4, "Using row-based hash table for lazy dict");
@@ -4183,14 +4925,24 @@ static size_t ZSTD_loadDictionaryContent(ZSTD_match=
State_t* ms,
                 DEBUGLOG(4, "Using chain-based hash table for lazy dict");
             }
         }
+#else
+        assert(0); /* shouldn't be called: cparams should've been adjusted=
. */
+#endif
         break;
=20
     case ZSTD_btlazy2:   /* we want the dictionary table fully sorted */
     case ZSTD_btopt:
     case ZSTD_btultra:
     case ZSTD_btultra2:
+#if !defined(ZSTD_EXCLUDE_BTLAZY2_BLOCK_COMPRESSOR) \
+ || !defined(ZSTD_EXCLUDE_BTOPT_BLOCK_COMPRESSOR) \
+ || !defined(ZSTD_EXCLUDE_BTULTRA_BLOCK_COMPRESSOR)
         assert(srcSize >=3D HASH_READ_SIZE);
+        DEBUGLOG(4, "Fill %u bytes into the Binary Tree", (unsigned)srcSiz=
e);
         ZSTD_updateTree(ms, iend-HASH_READ_SIZE, iend);
+#else
+        assert(0); /* shouldn't be called: cparams should've been adjusted=
. */
+#endif
         break;
=20
     default:
@@ -4233,20 +4985,19 @@ size_t ZSTD_loadCEntropy(ZSTD_compressedBlockState_=
t* bs, void* workspace,
     {   unsigned maxSymbolValue =3D 255;
         unsigned hasZeroWeights =3D 1;
         size_t const hufHeaderSize =3D HUF_readCTable((HUF_CElt*)bs->entro=
py.huf.CTable, &maxSymbolValue, dictPtr,
-            dictEnd-dictPtr, &hasZeroWeights);
+            (size_t)(dictEnd-dictPtr), &hasZeroWeights);
=20
         /* We only set the loaded table as valid if it contains all non-ze=
ro
          * weights. Otherwise, we set it to check */
-        if (!hasZeroWeights)
+        if (!hasZeroWeights && maxSymbolValue =3D=3D 255)
             bs->entropy.huf.repeatMode =3D HUF_repeat_valid;
=20
         RETURN_ERROR_IF(HUF_isError(hufHeaderSize), dictionary_corrupted, =
"");
-        RETURN_ERROR_IF(maxSymbolValue < 255, dictionary_corrupted, "");
         dictPtr +=3D hufHeaderSize;
     }
=20
     {   unsigned offcodeLog;
-        size_t const offcodeHeaderSize =3D FSE_readNCount(offcodeNCount, &=
offcodeMaxValue, &offcodeLog, dictPtr, dictEnd-dictPtr);
+        size_t const offcodeHeaderSize =3D FSE_readNCount(offcodeNCount, &=
offcodeMaxValue, &offcodeLog, dictPtr, (size_t)(dictEnd-dictPtr));
         RETURN_ERROR_IF(FSE_isError(offcodeHeaderSize), dictionary_corrupt=
ed, "");
         RETURN_ERROR_IF(offcodeLog > OffFSELog, dictionary_corrupted, "");
         /* fill all offset symbols to avoid garbage at end of table */
@@ -4261,7 +5012,7 @@ size_t ZSTD_loadCEntropy(ZSTD_compressedBlockState_t*=
 bs, void* workspace,
=20
     {   short matchlengthNCount[MaxML+1];
         unsigned matchlengthMaxValue =3D MaxML, matchlengthLog;
-        size_t const matchlengthHeaderSize =3D FSE_readNCount(matchlengthN=
Count, &matchlengthMaxValue, &matchlengthLog, dictPtr, dictEnd-dictPtr);
+        size_t const matchlengthHeaderSize =3D FSE_readNCount(matchlengthN=
Count, &matchlengthMaxValue, &matchlengthLog, dictPtr, (size_t)(dictEnd-dic=
tPtr));
         RETURN_ERROR_IF(FSE_isError(matchlengthHeaderSize), dictionary_cor=
rupted, "");
         RETURN_ERROR_IF(matchlengthLog > MLFSELog, dictionary_corrupted, "=
");
         RETURN_ERROR_IF(FSE_isError(FSE_buildCTable_wksp(
@@ -4275,7 +5026,7 @@ size_t ZSTD_loadCEntropy(ZSTD_compressedBlockState_t*=
 bs, void* workspace,
=20
     {   short litlengthNCount[MaxLL+1];
         unsigned litlengthMaxValue =3D MaxLL, litlengthLog;
-        size_t const litlengthHeaderSize =3D FSE_readNCount(litlengthNCoun=
t, &litlengthMaxValue, &litlengthLog, dictPtr, dictEnd-dictPtr);
+        size_t const litlengthHeaderSize =3D FSE_readNCount(litlengthNCoun=
t, &litlengthMaxValue, &litlengthLog, dictPtr, (size_t)(dictEnd-dictPtr));
         RETURN_ERROR_IF(FSE_isError(litlengthHeaderSize), dictionary_corru=
pted, "");
         RETURN_ERROR_IF(litlengthLog > LLFSELog, dictionary_corrupted, "");
         RETURN_ERROR_IF(FSE_isError(FSE_buildCTable_wksp(
@@ -4309,7 +5060,7 @@ size_t ZSTD_loadCEntropy(ZSTD_compressedBlockState_t*=
 bs, void* workspace,
                 RETURN_ERROR_IF(bs->rep[u] > dictContentSize, dictionary_c=
orrupted, "");
     }   }   }
=20
-    return dictPtr - (const BYTE*)dict;
+    return (size_t)(dictPtr - (const BYTE*)dict);
 }
=20
 /* Dictionary format :
@@ -4322,11 +5073,12 @@ size_t ZSTD_loadCEntropy(ZSTD_compressedBlockState_=
t* bs, void* workspace,
  *                dictSize supposed >=3D 8
  */
 static size_t ZSTD_loadZstdDictionary(ZSTD_compressedBlockState_t* bs,
-                                      ZSTD_matchState_t* ms,
+                                      ZSTD_MatchState_t* ms,
                                       ZSTD_cwksp* ws,
                                       ZSTD_CCtx_params const* params,
                                       const void* dict, size_t dictSize,
                                       ZSTD_dictTableLoadMethod_e dtlm,
+                                      ZSTD_tableFillPurpose_e tfp,
                                       void* workspace)
 {
     const BYTE* dictPtr =3D (const BYTE*)dict;
@@ -4345,7 +5097,7 @@ static size_t ZSTD_loadZstdDictionary(ZSTD_compressed=
BlockState_t* bs,
     {
         size_t const dictContentSize =3D (size_t)(dictEnd - dictPtr);
         FORWARD_IF_ERROR(ZSTD_loadDictionaryContent(
-            ms, NULL, ws, params, dictPtr, dictContentSize, dtlm), "");
+            ms, NULL, ws, params, dictPtr, dictContentSize, dtlm, tfp), ""=
);
     }
     return dictID;
 }
@@ -4354,13 +5106,14 @@ static size_t ZSTD_loadZstdDictionary(ZSTD_compress=
edBlockState_t* bs,
 *   @return : dictID, or an error code */
 static size_t
 ZSTD_compress_insertDictionary(ZSTD_compressedBlockState_t* bs,
-                               ZSTD_matchState_t* ms,
+                               ZSTD_MatchState_t* ms,
                                ldmState_t* ls,
                                ZSTD_cwksp* ws,
                          const ZSTD_CCtx_params* params,
                          const void* dict, size_t dictSize,
                                ZSTD_dictContentType_e dictContentType,
                                ZSTD_dictTableLoadMethod_e dtlm,
+                               ZSTD_tableFillPurpose_e tfp,
                                void* workspace)
 {
     DEBUGLOG(4, "ZSTD_compress_insertDictionary (dictSize=3D%u)", (U32)dic=
tSize);
@@ -4373,13 +5126,13 @@ ZSTD_compress_insertDictionary(ZSTD_compressedBlock=
State_t* bs,
=20
     /* dict restricted modes */
     if (dictContentType =3D=3D ZSTD_dct_rawContent)
-        return ZSTD_loadDictionaryContent(ms, ls, ws, params, dict, dictSi=
ze, dtlm);
+        return ZSTD_loadDictionaryContent(ms, ls, ws, params, dict, dictSi=
ze, dtlm, tfp);
=20
     if (MEM_readLE32(dict) !=3D ZSTD_MAGIC_DICTIONARY) {
         if (dictContentType =3D=3D ZSTD_dct_auto) {
             DEBUGLOG(4, "raw content dictionary detected");
             return ZSTD_loadDictionaryContent(
-                ms, ls, ws, params, dict, dictSize, dtlm);
+                ms, ls, ws, params, dict, dictSize, dtlm, tfp);
         }
         RETURN_ERROR_IF(dictContentType =3D=3D ZSTD_dct_fullDict, dictiona=
ry_wrong, "");
         assert(0);   /* impossible */
@@ -4387,13 +5140,14 @@ ZSTD_compress_insertDictionary(ZSTD_compressedBlock=
State_t* bs,
=20
     /* dict as full zstd dictionary */
     return ZSTD_loadZstdDictionary(
-        bs, ms, ws, params, dict, dictSize, dtlm, workspace);
+        bs, ms, ws, params, dict, dictSize, dtlm, tfp, workspace);
 }
=20
 #define ZSTD_USE_CDICT_PARAMS_SRCSIZE_CUTOFF (128 KB)
 #define ZSTD_USE_CDICT_PARAMS_DICTSIZE_MULTIPLIER (6ULL)
=20
 /*! ZSTD_compressBegin_internal() :
+ * Assumption : either @dict OR @cdict (or none) is non-NULL, never both
  * @return : 0, or an error code */
 static size_t ZSTD_compressBegin_internal(ZSTD_CCtx* cctx,
                                     const void* dict, size_t dictSize,
@@ -4426,11 +5180,11 @@ static size_t ZSTD_compressBegin_internal(ZSTD_CCtx=
* cctx,
                         cctx->blockState.prevCBlock, &cctx->blockState.mat=
chState,
                         &cctx->ldmState, &cctx->workspace, &cctx->appliedP=
arams, cdict->dictContent,
                         cdict->dictContentSize, cdict->dictContentType, dt=
lm,
-                        cctx->entropyWorkspace)
+                        ZSTD_tfp_forCCtx, cctx->tmpWorkspace)
               : ZSTD_compress_insertDictionary(
                         cctx->blockState.prevCBlock, &cctx->blockState.mat=
chState,
                         &cctx->ldmState, &cctx->workspace, &cctx->appliedP=
arams, dict, dictSize,
-                        dictContentType, dtlm, cctx->entropyWorkspace);
+                        dictContentType, dtlm, ZSTD_tfp_forCCtx, cctx->tmp=
Workspace);
         FORWARD_IF_ERROR(dictID, "ZSTD_compress_insertDictionary failed");
         assert(dictID <=3D UINT_MAX);
         cctx->dictID =3D (U32)dictID;
@@ -4471,11 +5225,11 @@ size_t ZSTD_compressBegin_advanced(ZSTD_CCtx* cctx,
                                             &cctxParams, pledgedSrcSize);
 }
=20
-size_t ZSTD_compressBegin_usingDict(ZSTD_CCtx* cctx, const void* dict, siz=
e_t dictSize, int compressionLevel)
+static size_t
+ZSTD_compressBegin_usingDict_deprecated(ZSTD_CCtx* cctx, const void* dict,=
 size_t dictSize, int compressionLevel)
 {
     ZSTD_CCtx_params cctxParams;
-    {
-        ZSTD_parameters const params =3D ZSTD_getParams_internal(compressi=
onLevel, ZSTD_CONTENTSIZE_UNKNOWN, dictSize, ZSTD_cpm_noAttachDict);
+    {   ZSTD_parameters const params =3D ZSTD_getParams_internal(compressi=
onLevel, ZSTD_CONTENTSIZE_UNKNOWN, dictSize, ZSTD_cpm_noAttachDict);
         ZSTD_CCtxParams_init_internal(&cctxParams, &params, (compressionLe=
vel =3D=3D 0) ? ZSTD_CLEVEL_DEFAULT : compressionLevel);
     }
     DEBUGLOG(4, "ZSTD_compressBegin_usingDict (dictSize=3D%u)", (unsigned)=
dictSize);
@@ -4483,9 +5237,15 @@ size_t ZSTD_compressBegin_usingDict(ZSTD_CCtx* cctx,=
 const void* dict, size_t di
                                        &cctxParams, ZSTD_CONTENTSIZE_UNKNO=
WN, ZSTDb_not_buffered);
 }
=20
+size_t
+ZSTD_compressBegin_usingDict(ZSTD_CCtx* cctx, const void* dict, size_t dic=
tSize, int compressionLevel)
+{
+    return ZSTD_compressBegin_usingDict_deprecated(cctx, dict, dictSize, c=
ompressionLevel);
+}
+
 size_t ZSTD_compressBegin(ZSTD_CCtx* cctx, int compressionLevel)
 {
-    return ZSTD_compressBegin_usingDict(cctx, NULL, 0, compressionLevel);
+    return ZSTD_compressBegin_usingDict_deprecated(cctx, NULL, 0, compress=
ionLevel);
 }
=20
=20
@@ -4496,14 +5256,13 @@ static size_t ZSTD_writeEpilogue(ZSTD_CCtx* cctx, v=
oid* dst, size_t dstCapacity)
 {
     BYTE* const ostart =3D (BYTE*)dst;
     BYTE* op =3D ostart;
-    size_t fhSize =3D 0;
=20
     DEBUGLOG(4, "ZSTD_writeEpilogue");
     RETURN_ERROR_IF(cctx->stage =3D=3D ZSTDcs_created, stage_wrong, "init =
missing");
=20
     /* special case : empty frame */
     if (cctx->stage =3D=3D ZSTDcs_init) {
-        fhSize =3D ZSTD_writeFrameHeader(dst, dstCapacity, &cctx->appliedP=
arams, 0, 0);
+        size_t fhSize =3D ZSTD_writeFrameHeader(dst, dstCapacity, &cctx->a=
ppliedParams, 0, 0);
         FORWARD_IF_ERROR(fhSize, "ZSTD_writeFrameHeader failed");
         dstCapacity -=3D fhSize;
         op +=3D fhSize;
@@ -4513,8 +5272,9 @@ static size_t ZSTD_writeEpilogue(ZSTD_CCtx* cctx, voi=
d* dst, size_t dstCapacity)
     if (cctx->stage !=3D ZSTDcs_ending) {
         /* write one last empty block, make it the "last" block */
         U32 const cBlockHeader24 =3D 1 /* last block */ + (((U32)bt_raw)<<=
1) + 0;
-        RETURN_ERROR_IF(dstCapacity<4, dstSize_tooSmall, "no room for epil=
ogue");
-        MEM_writeLE32(op, cBlockHeader24);
+        ZSTD_STATIC_ASSERT(ZSTD_BLOCKHEADERSIZE =3D=3D 3);
+        RETURN_ERROR_IF(dstCapacity<3, dstSize_tooSmall, "no room for epil=
ogue");
+        MEM_writeLE24(op, cBlockHeader24);
         op +=3D ZSTD_blockHeaderSize;
         dstCapacity -=3D ZSTD_blockHeaderSize;
     }
@@ -4528,7 +5288,7 @@ static size_t ZSTD_writeEpilogue(ZSTD_CCtx* cctx, voi=
d* dst, size_t dstCapacity)
     }
=20
     cctx->stage =3D ZSTDcs_created;  /* return to "created but no init" st=
atus */
-    return op-ostart;
+    return (size_t)(op-ostart);
 }
=20
 void ZSTD_CCtx_trace(ZSTD_CCtx* cctx, size_t extraCSize)
@@ -4537,9 +5297,9 @@ void ZSTD_CCtx_trace(ZSTD_CCtx* cctx, size_t extraCSi=
ze)
     (void)extraCSize;
 }
=20
-size_t ZSTD_compressEnd (ZSTD_CCtx* cctx,
-                         void* dst, size_t dstCapacity,
-                   const void* src, size_t srcSize)
+size_t ZSTD_compressEnd_public(ZSTD_CCtx* cctx,
+                               void* dst, size_t dstCapacity,
+                         const void* src, size_t srcSize)
 {
     size_t endResult;
     size_t const cSize =3D ZSTD_compressContinue_internal(cctx,
@@ -4563,6 +5323,14 @@ size_t ZSTD_compressEnd (ZSTD_CCtx* cctx,
     return cSize + endResult;
 }
=20
+/* NOTE: Must just wrap ZSTD_compressEnd_public() */
+size_t ZSTD_compressEnd(ZSTD_CCtx* cctx,
+                        void* dst, size_t dstCapacity,
+                  const void* src, size_t srcSize)
+{
+    return ZSTD_compressEnd_public(cctx, dst, dstCapacity, src, srcSize);
+}
+
 size_t ZSTD_compress_advanced (ZSTD_CCtx* cctx,
                                void* dst, size_t dstCapacity,
                          const void* src, size_t srcSize,
@@ -4591,7 +5359,7 @@ size_t ZSTD_compress_advanced_internal(
     FORWARD_IF_ERROR( ZSTD_compressBegin_internal(cctx,
                          dict, dictSize, ZSTD_dct_auto, ZSTD_dtlm_fast, NU=
LL,
                          params, srcSize, ZSTDb_not_buffered) , "");
-    return ZSTD_compressEnd(cctx, dst, dstCapacity, src, srcSize);
+    return ZSTD_compressEnd_public(cctx, dst, dstCapacity, src, srcSize);
 }
=20
 size_t ZSTD_compress_usingDict(ZSTD_CCtx* cctx,
@@ -4709,7 +5477,7 @@ static size_t ZSTD_initCDict_internal(
         {   size_t const dictID =3D ZSTD_compress_insertDictionary(
                     &cdict->cBlockState, &cdict->matchState, NULL, &cdict-=
>workspace,
                     &params, cdict->dictContent, cdict->dictContentSize,
-                    dictContentType, ZSTD_dtlm_full, cdict->entropyWorkspa=
ce);
+                    dictContentType, ZSTD_dtlm_full, ZSTD_tfp_forCDict, cd=
ict->entropyWorkspace);
             FORWARD_IF_ERROR(dictID, "ZSTD_compress_insertDictionary faile=
d");
             assert(dictID <=3D (size_t)(U32)-1);
             cdict->dictID =3D (U32)dictID;
@@ -4719,14 +5487,16 @@ static size_t ZSTD_initCDict_internal(
     return 0;
 }
=20
-static ZSTD_CDict* ZSTD_createCDict_advanced_internal(size_t dictSize,
-                                      ZSTD_dictLoadMethod_e dictLoadMethod,
-                                      ZSTD_compressionParameters cParams,
-                                      ZSTD_paramSwitch_e useRowMatchFinder,
-                                      U32 enableDedicatedDictSearch,
-                                      ZSTD_customMem customMem)
+static ZSTD_CDict*
+ZSTD_createCDict_advanced_internal(size_t dictSize,
+                                ZSTD_dictLoadMethod_e dictLoadMethod,
+                                ZSTD_compressionParameters cParams,
+                                ZSTD_ParamSwitch_e useRowMatchFinder,
+                                int enableDedicatedDictSearch,
+                                ZSTD_customMem customMem)
 {
     if ((!customMem.customAlloc) ^ (!customMem.customFree)) return NULL;
+    DEBUGLOG(3, "ZSTD_createCDict_advanced_internal (dictSize=3D%u)", (uns=
igned)dictSize);
=20
     {   size_t const workspaceSize =3D
             ZSTD_cwksp_alloc_size(sizeof(ZSTD_CDict)) +
@@ -4763,6 +5533,7 @@ ZSTD_CDict* ZSTD_createCDict_advanced(const void* dic=
tBuffer, size_t dictSize,
 {
     ZSTD_CCtx_params cctxParams;
     ZSTD_memset(&cctxParams, 0, sizeof(cctxParams));
+    DEBUGLOG(3, "ZSTD_createCDict_advanced, dictSize=3D%u, mode=3D%u", (un=
signed)dictSize, (unsigned)dictContentType);
     ZSTD_CCtxParams_init(&cctxParams, 0);
     cctxParams.cParams =3D cParams;
     cctxParams.customMem =3D customMem;
@@ -4783,7 +5554,7 @@ ZSTD_CDict* ZSTD_createCDict_advanced2(
     ZSTD_compressionParameters cParams;
     ZSTD_CDict* cdict;
=20
-    DEBUGLOG(3, "ZSTD_createCDict_advanced2, mode %u", (unsigned)dictConte=
ntType);
+    DEBUGLOG(3, "ZSTD_createCDict_advanced2, dictSize=3D%u, mode=3D%u", (u=
nsigned)dictSize, (unsigned)dictContentType);
     if (!customMem.customAlloc ^ !customMem.customFree) return NULL;
=20
     if (cctxParams.enableDedicatedDictSearch) {
@@ -4802,7 +5573,7 @@ ZSTD_CDict* ZSTD_createCDict_advanced2(
             &cctxParams, ZSTD_CONTENTSIZE_UNKNOWN, dictSize, ZSTD_cpm_crea=
teCDict);
     }
=20
-    DEBUGLOG(3, "ZSTD_createCDict_advanced2: DDS: %u", cctxParams.enableDe=
dicatedDictSearch);
+    DEBUGLOG(3, "ZSTD_createCDict_advanced2: DedicatedDictSearch=3D%u", cc=
txParams.enableDedicatedDictSearch);
     cctxParams.cParams =3D cParams;
     cctxParams.useRowMatchFinder =3D ZSTD_resolveRowMatchFinderMode(cctxPa=
rams.useRowMatchFinder, &cParams);
=20
@@ -4810,10 +5581,8 @@ ZSTD_CDict* ZSTD_createCDict_advanced2(
                         dictLoadMethod, cctxParams.cParams,
                         cctxParams.useRowMatchFinder, cctxParams.enableDed=
icatedDictSearch,
                         customMem);
-    if (!cdict)
-        return NULL;
=20
-    if (ZSTD_isError( ZSTD_initCDict_internal(cdict,
+    if (!cdict || ZSTD_isError( ZSTD_initCDict_internal(cdict,
                                     dict, dictSize,
                                     dictLoadMethod, dictContentType,
                                     cctxParams) )) {
@@ -4867,7 +5636,7 @@ size_t ZSTD_freeCDict(ZSTD_CDict* cdict)
  *  workspaceSize: Use ZSTD_estimateCDictSize()
  *                 to determine how large workspace must be.
  *  cParams : use ZSTD_getCParams() to transform a compression level
- *            into its relevants cParams.
+ *            into its relevant cParams.
  * @return : pointer to ZSTD_CDict*, or NULL if error (size too small)
  *  Note : there is no corresponding "free" function.
  *         Since workspace was allocated externally, it must be freed exte=
rnally.
@@ -4879,7 +5648,7 @@ const ZSTD_CDict* ZSTD_initStaticCDict(
                                  ZSTD_dictContentType_e dictContentType,
                                  ZSTD_compressionParameters cParams)
 {
-    ZSTD_paramSwitch_e const useRowMatchFinder =3D ZSTD_resolveRowMatchFin=
derMode(ZSTD_ps_auto, &cParams);
+    ZSTD_ParamSwitch_e const useRowMatchFinder =3D ZSTD_resolveRowMatchFin=
derMode(ZSTD_ps_auto, &cParams);
     /* enableDedicatedDictSearch =3D=3D 1 ensures matchstate is not too sm=
all in case this CDict will be used for DDS + row hash */
     size_t const matchStateSize =3D ZSTD_sizeof_matchState(&cParams, useRo=
wMatchFinder, /* enableDedicatedDictSearch */ 1, /* forCCtx */ 0);
     size_t const neededSize =3D ZSTD_cwksp_alloc_size(sizeof(ZSTD_CDict))
@@ -4890,6 +5659,7 @@ const ZSTD_CDict* ZSTD_initStaticCDict(
     ZSTD_CDict* cdict;
     ZSTD_CCtx_params params;
=20
+    DEBUGLOG(4, "ZSTD_initStaticCDict (dictSize=3D=3D%u)", (unsigned)dictS=
ize);
     if ((size_t)workspace & 7) return NULL;  /* 8-aligned */
=20
     {
@@ -4900,14 +5670,13 @@ const ZSTD_CDict* ZSTD_initStaticCDict(
         ZSTD_cwksp_move(&cdict->workspace, &ws);
     }
=20
-    DEBUGLOG(4, "(workspaceSize < neededSize) : (%u < %u) =3D> %u",
-        (unsigned)workspaceSize, (unsigned)neededSize, (unsigned)(workspac=
eSize < neededSize));
     if (workspaceSize < neededSize) return NULL;
=20
     ZSTD_CCtxParams_init(&params, 0);
     params.cParams =3D cParams;
     params.useRowMatchFinder =3D useRowMatchFinder;
     cdict->useRowMatchFinder =3D useRowMatchFinder;
+    cdict->compressionLevel =3D ZSTD_NO_CLEVEL;
=20
     if (ZSTD_isError( ZSTD_initCDict_internal(cdict,
                                               dict, dictSize,
@@ -4987,12 +5756,17 @@ size_t ZSTD_compressBegin_usingCDict_advanced(
=20
 /* ZSTD_compressBegin_usingCDict() :
  * cdict must be !=3D NULL */
-size_t ZSTD_compressBegin_usingCDict(ZSTD_CCtx* cctx, const ZSTD_CDict* cd=
ict)
+size_t ZSTD_compressBegin_usingCDict_deprecated(ZSTD_CCtx* cctx, const ZST=
D_CDict* cdict)
 {
     ZSTD_frameParameters const fParams =3D { 0 /*content*/, 0 /*checksum*/=
, 0 /*noDictID*/ };
     return ZSTD_compressBegin_usingCDict_internal(cctx, cdict, fParams, ZS=
TD_CONTENTSIZE_UNKNOWN);
 }
=20
+size_t ZSTD_compressBegin_usingCDict(ZSTD_CCtx* cctx, const ZSTD_CDict* cd=
ict)
+{
+    return ZSTD_compressBegin_usingCDict_deprecated(cctx, cdict);
+}
+
 /*! ZSTD_compress_usingCDict_internal():
  * Implementation of various ZSTD_compress_usingCDict* functions.
  */
@@ -5002,7 +5776,7 @@ static size_t ZSTD_compress_usingCDict_internal(ZSTD_=
CCtx* cctx,
                                 const ZSTD_CDict* cdict, ZSTD_frameParamet=
ers fParams)
 {
     FORWARD_IF_ERROR(ZSTD_compressBegin_usingCDict_internal(cctx, cdict, f=
Params, srcSize), ""); /* will check if cdict !=3D NULL */
-    return ZSTD_compressEnd(cctx, dst, dstCapacity, src, srcSize);
+    return ZSTD_compressEnd_public(cctx, dst, dstCapacity, src, srcSize);
 }
=20
 /*! ZSTD_compress_usingCDict_advanced():
@@ -5068,7 +5842,7 @@ size_t ZSTD_CStreamOutSize(void)
     return ZSTD_compressBound(ZSTD_BLOCKSIZE_MAX) + ZSTD_blockHeaderSize +=
 4 /* 32-bits hash */ ;
 }
=20
-static ZSTD_cParamMode_e ZSTD_getCParamMode(ZSTD_CDict const* cdict, ZSTD_=
CCtx_params const* params, U64 pledgedSrcSize)
+static ZSTD_CParamMode_e ZSTD_getCParamMode(ZSTD_CDict const* cdict, ZSTD_=
CCtx_params const* params, U64 pledgedSrcSize)
 {
     if (cdict !=3D NULL && ZSTD_shouldAttachDict(cdict, params, pledgedSrc=
Size))
         return ZSTD_cpm_attachDict;
@@ -5199,30 +5973,41 @@ size_t ZSTD_initCStream(ZSTD_CStream* zcs, int comp=
ressionLevel)
=20
 static size_t ZSTD_nextInputSizeHint(const ZSTD_CCtx* cctx)
 {
-    size_t hintInSize =3D cctx->inBuffTarget - cctx->inBuffPos;
-    if (hintInSize=3D=3D0) hintInSize =3D cctx->blockSize;
-    return hintInSize;
+    if (cctx->appliedParams.inBufferMode =3D=3D ZSTD_bm_stable) {
+        return cctx->blockSizeMax - cctx->stableIn_notConsumed;
+    }
+    assert(cctx->appliedParams.inBufferMode =3D=3D ZSTD_bm_buffered);
+    {   size_t hintInSize =3D cctx->inBuffTarget - cctx->inBuffPos;
+        if (hintInSize=3D=3D0) hintInSize =3D cctx->blockSizeMax;
+        return hintInSize;
+    }
 }
=20
 /* ZSTD_compressStream_generic():
  *  internal function for all *compressStream*() variants
- *  non-static, because can be called from zstdmt_compress.c
- * @return : hint size for next input */
+ * @return : hint size for next input to complete ongoing block */
 static size_t ZSTD_compressStream_generic(ZSTD_CStream* zcs,
                                           ZSTD_outBuffer* output,
                                           ZSTD_inBuffer* input,
                                           ZSTD_EndDirective const flushMod=
e)
 {
-    const char* const istart =3D (const char*)input->src;
-    const char* const iend =3D input->size !=3D 0 ? istart + input->size :=
 istart;
-    const char* ip =3D input->pos !=3D 0 ? istart + input->pos : istart;
-    char* const ostart =3D (char*)output->dst;
-    char* const oend =3D output->size !=3D 0 ? ostart + output->size : ost=
art;
-    char* op =3D output->pos !=3D 0 ? ostart + output->pos : ostart;
+    const char* const istart =3D (assert(input !=3D NULL), (const char*)in=
put->src);
+    const char* const iend =3D (istart !=3D NULL) ? istart + input->size :=
 istart;
+    const char* ip =3D (istart !=3D NULL) ? istart + input->pos : istart;
+    char* const ostart =3D (assert(output !=3D NULL), (char*)output->dst);
+    char* const oend =3D (ostart !=3D NULL) ? ostart + output->size : osta=
rt;
+    char* op =3D (ostart !=3D NULL) ? ostart + output->pos : ostart;
     U32 someMoreWork =3D 1;
=20
     /* check expectations */
-    DEBUGLOG(5, "ZSTD_compressStream_generic, flush=3D%u", (unsigned)flush=
Mode);
+    DEBUGLOG(5, "ZSTD_compressStream_generic, flush=3D%i, srcSize =3D %zu"=
, (int)flushMode, input->size - input->pos);
+    assert(zcs !=3D NULL);
+    if (zcs->appliedParams.inBufferMode =3D=3D ZSTD_bm_stable) {
+        assert(input->pos >=3D zcs->stableIn_notConsumed);
+        input->pos -=3D zcs->stableIn_notConsumed;
+        if (ip) ip -=3D zcs->stableIn_notConsumed;
+        zcs->stableIn_notConsumed =3D 0;
+    }
     if (zcs->appliedParams.inBufferMode =3D=3D ZSTD_bm_buffered) {
         assert(zcs->inBuff !=3D NULL);
         assert(zcs->inBuffSize > 0);
@@ -5231,8 +6016,10 @@ static size_t ZSTD_compressStream_generic(ZSTD_CStre=
am* zcs,
         assert(zcs->outBuff !=3D  NULL);
         assert(zcs->outBuffSize > 0);
     }
-    assert(output->pos <=3D output->size);
+    if (input->src =3D=3D NULL) assert(input->size =3D=3D 0);
     assert(input->pos <=3D input->size);
+    if (output->dst =3D=3D NULL) assert(output->size =3D=3D 0);
+    assert(output->pos <=3D output->size);
     assert((U32)flushMode <=3D (U32)ZSTD_e_end);
=20
     while (someMoreWork) {
@@ -5243,12 +6030,13 @@ static size_t ZSTD_compressStream_generic(ZSTD_CStr=
eam* zcs,
=20
         case zcss_load:
             if ( (flushMode =3D=3D ZSTD_e_end)
-              && ( (size_t)(oend-op) >=3D ZSTD_compressBound(iend-ip)     =
/* Enough output space */
+              && ( (size_t)(oend-op) >=3D ZSTD_compressBound((size_t)(iend=
-ip))     /* Enough output space */
                 || zcs->appliedParams.outBufferMode =3D=3D ZSTD_bm_stable)=
  /* OR we are allowed to return dstSizeTooSmall */
               && (zcs->inBuffPos =3D=3D 0) ) {
                 /* shortcut to compression pass directly into output buffe=
r */
-                size_t const cSize =3D ZSTD_compressEnd(zcs,
-                                                op, oend-op, ip, iend-ip);
+                size_t const cSize =3D ZSTD_compressEnd_public(zcs,
+                                                op, (size_t)(oend-op),
+                                                ip, (size_t)(iend-ip));
                 DEBUGLOG(4, "ZSTD_compressEnd : cSize=3D%u", (unsigned)cSi=
ze);
                 FORWARD_IF_ERROR(cSize, "ZSTD_compressEnd failed");
                 ip =3D iend;
@@ -5262,10 +6050,9 @@ static size_t ZSTD_compressStream_generic(ZSTD_CStre=
am* zcs,
                 size_t const toLoad =3D zcs->inBuffTarget - zcs->inBuffPos;
                 size_t const loaded =3D ZSTD_limitCopy(
                                         zcs->inBuff + zcs->inBuffPos, toLo=
ad,
-                                        ip, iend-ip);
+                                        ip, (size_t)(iend-ip));
                 zcs->inBuffPos +=3D loaded;
-                if (loaded !=3D 0)
-                    ip +=3D loaded;
+                if (ip) ip +=3D loaded;
                 if ( (flushMode =3D=3D ZSTD_e_continue)
                   && (zcs->inBuffPos < zcs->inBuffTarget) ) {
                     /* not enough input to fill full block : stop here */
@@ -5276,16 +6063,29 @@ static size_t ZSTD_compressStream_generic(ZSTD_CStr=
eam* zcs,
                     /* empty */
                     someMoreWork =3D 0; break;
                 }
+            } else {
+                assert(zcs->appliedParams.inBufferMode =3D=3D ZSTD_bm_stab=
le);
+                if ( (flushMode =3D=3D ZSTD_e_continue)
+                  && ( (size_t)(iend - ip) < zcs->blockSizeMax) ) {
+                    /* can't compress a full block : stop here */
+                    zcs->stableIn_notConsumed =3D (size_t)(iend - ip);
+                    ip =3D iend;  /* pretend to have consumed input */
+                    someMoreWork =3D 0; break;
+                }
+                if ( (flushMode =3D=3D ZSTD_e_flush)
+                  && (ip =3D=3D iend) ) {
+                    /* empty */
+                    someMoreWork =3D 0; break;
+                }
             }
             /* compress current block (note : this stage cannot be stopped=
 in the middle) */
             DEBUGLOG(5, "stream compression stage (flushMode=3D=3D%u)", fl=
ushMode);
             {   int const inputBuffered =3D (zcs->appliedParams.inBufferMo=
de =3D=3D ZSTD_bm_buffered);
                 void* cDst;
                 size_t cSize;
-                size_t oSize =3D oend-op;
-                size_t const iSize =3D inputBuffered
-                    ? zcs->inBuffPos - zcs->inToCompress
-                    : MIN((size_t)(iend - ip), zcs->blockSize);
+                size_t oSize =3D (size_t)(oend-op);
+                size_t const iSize =3D inputBuffered ? zcs->inBuffPos - zc=
s->inToCompress
+                                                   : MIN((size_t)(iend - i=
p), zcs->blockSizeMax);
                 if (oSize >=3D ZSTD_compressBound(iSize) || zcs->appliedPa=
rams.outBufferMode =3D=3D ZSTD_bm_stable)
                     cDst =3D op;   /* compress into output buffer, to skip=
 flush stage */
                 else
@@ -5293,34 +6093,31 @@ static size_t ZSTD_compressStream_generic(ZSTD_CStr=
eam* zcs,
                 if (inputBuffered) {
                     unsigned const lastBlock =3D (flushMode =3D=3D ZSTD_e_=
end) && (ip=3D=3Diend);
                     cSize =3D lastBlock ?
-                            ZSTD_compressEnd(zcs, cDst, oSize,
+                            ZSTD_compressEnd_public(zcs, cDst, oSize,
                                         zcs->inBuff + zcs->inToCompress, i=
Size) :
-                            ZSTD_compressContinue(zcs, cDst, oSize,
+                            ZSTD_compressContinue_public(zcs, cDst, oSize,
                                         zcs->inBuff + zcs->inToCompress, i=
Size);
                     FORWARD_IF_ERROR(cSize, "%s", lastBlock ? "ZSTD_compre=
ssEnd failed" : "ZSTD_compressContinue failed");
                     zcs->frameEnded =3D lastBlock;
                     /* prepare next block */
-                    zcs->inBuffTarget =3D zcs->inBuffPos + zcs->blockSize;
+                    zcs->inBuffTarget =3D zcs->inBuffPos + zcs->blockSizeM=
ax;
                     if (zcs->inBuffTarget > zcs->inBuffSize)
-                        zcs->inBuffPos =3D 0, zcs->inBuffTarget =3D zcs->b=
lockSize;
+                        zcs->inBuffPos =3D 0, zcs->inBuffTarget =3D zcs->b=
lockSizeMax;
                     DEBUGLOG(5, "inBuffTarget:%u / inBuffSize:%u",
                             (unsigned)zcs->inBuffTarget, (unsigned)zcs->in=
BuffSize);
                     if (!lastBlock)
                         assert(zcs->inBuffTarget <=3D zcs->inBuffSize);
                     zcs->inToCompress =3D zcs->inBuffPos;
-                } else {
-                    unsigned const lastBlock =3D (ip + iSize =3D=3D iend);
-                    assert(flushMode =3D=3D ZSTD_e_end /* Already validate=
d */);
+                } else { /* !inputBuffered, hence ZSTD_bm_stable */
+                    unsigned const lastBlock =3D (flushMode =3D=3D ZSTD_e_=
end) && (ip + iSize =3D=3D iend);
                     cSize =3D lastBlock ?
-                            ZSTD_compressEnd(zcs, cDst, oSize, ip, iSize) :
-                            ZSTD_compressContinue(zcs, cDst, oSize, ip, iS=
ize);
+                            ZSTD_compressEnd_public(zcs, cDst, oSize, ip, =
iSize) :
+                            ZSTD_compressContinue_public(zcs, cDst, oSize,=
 ip, iSize);
                     /* Consume the input prior to error checking to mirror=
 buffered mode. */
-                    if (iSize > 0)
-                        ip +=3D iSize;
+                    if (ip) ip +=3D iSize;
                     FORWARD_IF_ERROR(cSize, "%s", lastBlock ? "ZSTD_compre=
ssEnd failed" : "ZSTD_compressContinue failed");
                     zcs->frameEnded =3D lastBlock;
-                    if (lastBlock)
-                        assert(ip =3D=3D iend);
+                    if (lastBlock) assert(ip =3D=3D iend);
                 }
                 if (cDst =3D=3D op) {  /* no need to flush */
                     op +=3D cSize;
@@ -5369,8 +6166,8 @@ static size_t ZSTD_compressStream_generic(ZSTD_CStrea=
m* zcs,
         }
     }
=20
-    input->pos =3D ip - istart;
-    output->pos =3D op - ostart;
+    input->pos =3D (size_t)(ip - istart);
+    output->pos =3D (size_t)(op - ostart);
     if (zcs->frameEnded) return 0;
     return ZSTD_nextInputSizeHint(zcs);
 }
@@ -5390,8 +6187,10 @@ size_t ZSTD_compressStream(ZSTD_CStream* zcs, ZSTD_o=
utBuffer* output, ZSTD_inBuf
 /* After a compression call set the expected input/output buffer.
  * This is validated at the start of the next compression call.
  */
-static void ZSTD_setBufferExpectations(ZSTD_CCtx* cctx, ZSTD_outBuffer con=
st* output, ZSTD_inBuffer const* input)
+static void
+ZSTD_setBufferExpectations(ZSTD_CCtx* cctx, const ZSTD_outBuffer* output, =
const ZSTD_inBuffer* input)
 {
+    DEBUGLOG(5, "ZSTD_setBufferExpectations (for advanced stable in/out mo=
des)");
     if (cctx->appliedParams.inBufferMode =3D=3D ZSTD_bm_stable) {
         cctx->expectedInBuffer =3D *input;
     }
@@ -5410,22 +6209,27 @@ static size_t ZSTD_checkBufferStability(ZSTD_CCtx c=
onst* cctx,
 {
     if (cctx->appliedParams.inBufferMode =3D=3D ZSTD_bm_stable) {
         ZSTD_inBuffer const expect =3D cctx->expectedInBuffer;
-        if (expect.src !=3D input->src || expect.pos !=3D input->pos || ex=
pect.size !=3D input->size)
-            RETURN_ERROR(srcBuffer_wrong, "ZSTD_c_stableInBuffer enabled b=
ut input differs!");
-        if (endOp !=3D ZSTD_e_end)
-            RETURN_ERROR(srcBuffer_wrong, "ZSTD_c_stableInBuffer can only =
be used with ZSTD_e_end!");
+        if (expect.src !=3D input->src || expect.pos !=3D input->pos)
+            RETURN_ERROR(stabilityCondition_notRespected, "ZSTD_c_stableIn=
Buffer enabled but input differs!");
     }
+    (void)endOp;
     if (cctx->appliedParams.outBufferMode =3D=3D ZSTD_bm_stable) {
         size_t const outBufferSize =3D output->size - output->pos;
         if (cctx->expectedOutBufferSize !=3D outBufferSize)
-            RETURN_ERROR(dstBuffer_wrong, "ZSTD_c_stableOutBuffer enabled =
but output size differs!");
+            RETURN_ERROR(stabilityCondition_notRespected, "ZSTD_c_stableOu=
tBuffer enabled but output size differs!");
     }
     return 0;
 }
=20
+/*
+ * If @endOp =3D=3D ZSTD_e_end, @inSize becomes pledgedSrcSize.
+ * Otherwise, it's ignored.
+ * @return: 0 on success, or a ZSTD_error code otherwise.
+ */
 static size_t ZSTD_CCtx_init_compressStream2(ZSTD_CCtx* cctx,
                                              ZSTD_EndDirective endOp,
-                                             size_t inSize) {
+                                             size_t inSize)
+{
     ZSTD_CCtx_params params =3D cctx->requestedParams;
     ZSTD_prefixDict const prefixDict =3D cctx->prefixDict;
     FORWARD_IF_ERROR( ZSTD_initLocalDict(cctx) , ""); /* Init the local di=
ct if present. */
@@ -5438,21 +6242,24 @@ static size_t ZSTD_CCtx_init_compressStream2(ZSTD_C=
Ctx* cctx,
          */
         params.compressionLevel =3D cctx->cdict->compressionLevel;
     }
-    DEBUGLOG(4, "ZSTD_compressStream2 : transparent init stage");
-    if (endOp =3D=3D ZSTD_e_end) cctx->pledgedSrcSizePlusOne =3D inSize + =
1;  /* auto-fix pledgedSrcSize */
-    {
-        size_t const dictSize =3D prefixDict.dict
+    DEBUGLOG(4, "ZSTD_CCtx_init_compressStream2 : transparent init stage");
+    if (endOp =3D=3D ZSTD_e_end) cctx->pledgedSrcSizePlusOne =3D inSize + =
1;  /* auto-determine pledgedSrcSize */
+
+    {   size_t const dictSize =3D prefixDict.dict
                 ? prefixDict.dictSize
                 : (cctx->cdict ? cctx->cdict->dictContentSize : 0);
-        ZSTD_cParamMode_e const mode =3D ZSTD_getCParamMode(cctx->cdict, &=
params, cctx->pledgedSrcSizePlusOne - 1);
+        ZSTD_CParamMode_e const mode =3D ZSTD_getCParamMode(cctx->cdict, &=
params, cctx->pledgedSrcSizePlusOne - 1);
         params.cParams =3D ZSTD_getCParamsFromCCtxParams(
                 &params, cctx->pledgedSrcSizePlusOne-1,
                 dictSize, mode);
     }
=20
-    params.useBlockSplitter =3D ZSTD_resolveBlockSplitterMode(params.useBl=
ockSplitter, &params.cParams);
+    params.postBlockSplitter =3D ZSTD_resolveBlockSplitterMode(params.post=
BlockSplitter, &params.cParams);
     params.ldmParams.enableLdm =3D ZSTD_resolveEnableLdm(params.ldmParams.=
enableLdm, &params.cParams);
     params.useRowMatchFinder =3D ZSTD_resolveRowMatchFinderMode(params.use=
RowMatchFinder, &params.cParams);
+    params.validateSequences =3D ZSTD_resolveExternalSequenceValidation(pa=
rams.validateSequences);
+    params.maxBlockSize =3D ZSTD_resolveMaxBlockSize(params.maxBlockSize);
+    params.searchForExternalRepcodes =3D ZSTD_resolveExternalRepcodeSearch=
(params.searchForExternalRepcodes, params.compressionLevel);
=20
     {   U64 const pledgedSrcSize =3D cctx->pledgedSrcSizePlusOne - 1;
         assert(!ZSTD_isError(ZSTD_checkCParams(params.cParams)));
@@ -5468,7 +6275,7 @@ static size_t ZSTD_CCtx_init_compressStream2(ZSTD_CCt=
x* cctx,
             /* for small input: avoid automatic flush on reaching end of b=
lock, since
             * it would require to add a 3-bytes null block to end frame
             */
-            cctx->inBuffTarget =3D cctx->blockSize + (cctx->blockSize =3D=
=3D pledgedSrcSize);
+            cctx->inBuffTarget =3D cctx->blockSizeMax + (cctx->blockSizeMa=
x =3D=3D pledgedSrcSize);
         } else {
             cctx->inBuffTarget =3D 0;
         }
@@ -5479,6 +6286,8 @@ static size_t ZSTD_CCtx_init_compressStream2(ZSTD_CCt=
x* cctx,
     return 0;
 }
=20
+/* @return provides a minimum amount of data remaining to be flushed from =
internal buffers
+ */
 size_t ZSTD_compressStream2( ZSTD_CCtx* cctx,
                              ZSTD_outBuffer* output,
                              ZSTD_inBuffer* input,
@@ -5493,8 +6302,27 @@ size_t ZSTD_compressStream2( ZSTD_CCtx* cctx,
=20
     /* transparent initialization stage */
     if (cctx->streamStage =3D=3D zcss_init) {
-        FORWARD_IF_ERROR(ZSTD_CCtx_init_compressStream2(cctx, endOp, input=
->size), "CompressStream2 initialization failed");
-        ZSTD_setBufferExpectations(cctx, output, input);    /* Set initial=
 buffer expectations now that we've initialized */
+        size_t const inputSize =3D input->size - input->pos;  /* no obliga=
tion to start from pos=3D=3D0 */
+        size_t const totalInputSize =3D inputSize + cctx->stableIn_notCons=
umed;
+        if ( (cctx->requestedParams.inBufferMode =3D=3D ZSTD_bm_stable) /*=
 input is presumed stable, across invocations */
+          && (endOp =3D=3D ZSTD_e_continue)                             /*=
 no flush requested, more input to come */
+          && (totalInputSize < ZSTD_BLOCKSIZE_MAX) ) {              /* not=
 even reached one block yet */
+            if (cctx->stableIn_notConsumed) {  /* not the first time */
+                /* check stable source guarantees */
+                RETURN_ERROR_IF(input->src !=3D cctx->expectedInBuffer.src=
, stabilityCondition_notRespected, "stableInBuffer condition not respected:=
 wrong src pointer");
+                RETURN_ERROR_IF(input->pos !=3D cctx->expectedInBuffer.siz=
e, stabilityCondition_notRespected, "stableInBuffer condition not respected=
: externally modified pos");
+            }
+            /* pretend input was consumed, to give a sense forward progres=
s */
+            input->pos =3D input->size;
+            /* save stable inBuffer, for later control, and flush/end */
+            cctx->expectedInBuffer =3D *input;
+            /* but actually input wasn't consumed, so keep track of positi=
on from where compression shall resume */
+            cctx->stableIn_notConsumed +=3D inputSize;
+            /* don't initialize yet, wait for the first block of flush() o=
rder, for better parameters adaptation */
+            return ZSTD_FRAMEHEADERSIZE_MIN(cctx->requestedParams.format);=
  /* at least some header to produce */
+        }
+        FORWARD_IF_ERROR(ZSTD_CCtx_init_compressStream2(cctx, endOp, total=
InputSize), "compressStream2 initialization failed");
+        ZSTD_setBufferExpectations(cctx, output, input);   /* Set initial =
buffer expectations now that we've initialized */
     }
     /* end of transparent initialization stage */
=20
@@ -5512,13 +6340,20 @@ size_t ZSTD_compressStream2_simpleArgs (
                       const void* src, size_t srcSize, size_t* srcPos,
                             ZSTD_EndDirective endOp)
 {
-    ZSTD_outBuffer output =3D { dst, dstCapacity, *dstPos };
-    ZSTD_inBuffer  input  =3D { src, srcSize, *srcPos };
+    ZSTD_outBuffer output;
+    ZSTD_inBuffer  input;
+    output.dst =3D dst;
+    output.size =3D dstCapacity;
+    output.pos =3D *dstPos;
+    input.src =3D src;
+    input.size =3D srcSize;
+    input.pos =3D *srcPos;
     /* ZSTD_compressStream2() will check validity of dstPos and srcPos */
-    size_t const cErr =3D ZSTD_compressStream2(cctx, &output, &input, endO=
p);
-    *dstPos =3D output.pos;
-    *srcPos =3D input.pos;
-    return cErr;
+    {   size_t const cErr =3D ZSTD_compressStream2(cctx, &output, &input, =
endOp);
+        *dstPos =3D output.pos;
+        *srcPos =3D input.pos;
+        return cErr;
+    }
 }
=20
 size_t ZSTD_compress2(ZSTD_CCtx* cctx,
@@ -5541,6 +6376,7 @@ size_t ZSTD_compress2(ZSTD_CCtx* cctx,
         /* Reset to the original values. */
         cctx->requestedParams.inBufferMode =3D originalInBufferMode;
         cctx->requestedParams.outBufferMode =3D originalOutBufferMode;
+
         FORWARD_IF_ERROR(result, "ZSTD_compressStream2_simpleArgs failed");
         if (result !=3D 0) {  /* compression not completed, due to lack of=
 output space */
             assert(oPos =3D=3D dstCapacity);
@@ -5551,64 +6387,67 @@ size_t ZSTD_compress2(ZSTD_CCtx* cctx,
     }
 }
=20
-typedef struct {
-    U32 idx;             /* Index in array of ZSTD_Sequence */
-    U32 posInSequence;   /* Position within sequence at idx */
-    size_t posInSrc;        /* Number of bytes given by sequences provided=
 so far */
-} ZSTD_sequencePosition;
-
 /* ZSTD_validateSequence() :
- * @offCode : is presumed to follow format required by ZSTD_storeSeq()
+ * @offBase : must use the format required by ZSTD_storeSeq()
  * @returns a ZSTD error code if sequence is not valid
  */
 static size_t
-ZSTD_validateSequence(U32 offCode, U32 matchLength,
-                      size_t posInSrc, U32 windowLog, size_t dictSize)
+ZSTD_validateSequence(U32 offBase, U32 matchLength, U32 minMatch,
+                      size_t posInSrc, U32 windowLog, size_t dictSize, int=
 useSequenceProducer)
 {
-    U32 const windowSize =3D 1 << windowLog;
+    U32 const windowSize =3D 1u << windowLog;
     /* posInSrc represents the amount of data the decoder would decode up =
to this point.
      * As long as the amount of data decoded is less than or equal to wind=
ow size, offsets may be
      * larger than the total length of output decoded in order to referenc=
e the dict, even larger than
      * window size. After output surpasses windowSize, we're limited to wi=
ndowSize offsets again.
      */
     size_t const offsetBound =3D posInSrc > windowSize ? (size_t)windowSiz=
e : posInSrc + (size_t)dictSize;
-    RETURN_ERROR_IF(offCode > STORE_OFFSET(offsetBound), corruption_detect=
ed, "Offset too large!");
-    RETURN_ERROR_IF(matchLength < MINMATCH, corruption_detected, "Matchlen=
gth too small");
+    size_t const matchLenLowerBound =3D (minMatch =3D=3D 3 || useSequenceP=
roducer) ? 3 : 4;
+    RETURN_ERROR_IF(offBase > OFFSET_TO_OFFBASE(offsetBound), externalSequ=
ences_invalid, "Offset too large!");
+    /* Validate maxNbSeq is large enough for the given matchLength and min=
Match */
+    RETURN_ERROR_IF(matchLength < matchLenLowerBound, externalSequences_in=
valid, "Matchlength too small for the minMatch");
     return 0;
 }
=20
 /* Returns an offset code, given a sequence's raw offset, the ongoing repc=
ode array, and whether litLength =3D=3D 0 */
-static U32 ZSTD_finalizeOffCode(U32 rawOffset, const U32 rep[ZSTD_REP_NUM]=
, U32 ll0)
+static U32 ZSTD_finalizeOffBase(U32 rawOffset, const U32 rep[ZSTD_REP_NUM]=
, U32 ll0)
 {
-    U32 offCode =3D STORE_OFFSET(rawOffset);
+    U32 offBase =3D OFFSET_TO_OFFBASE(rawOffset);
=20
     if (!ll0 && rawOffset =3D=3D rep[0]) {
-        offCode =3D STORE_REPCODE_1;
+        offBase =3D REPCODE1_TO_OFFBASE;
     } else if (rawOffset =3D=3D rep[1]) {
-        offCode =3D STORE_REPCODE(2 - ll0);
+        offBase =3D REPCODE_TO_OFFBASE(2 - ll0);
     } else if (rawOffset =3D=3D rep[2]) {
-        offCode =3D STORE_REPCODE(3 - ll0);
+        offBase =3D REPCODE_TO_OFFBASE(3 - ll0);
     } else if (ll0 && rawOffset =3D=3D rep[0] - 1) {
-        offCode =3D STORE_REPCODE_3;
+        offBase =3D REPCODE3_TO_OFFBASE;
     }
-    return offCode;
+    return offBase;
 }
=20
-/* Returns 0 on success, and a ZSTD_error otherwise. This function scans t=
hrough an array of
- * ZSTD_Sequence, storing the sequences it finds, until it reaches a block=
 delimiter.
+/* This function scans through an array of ZSTD_Sequence,
+ * storing the sequences it reads, until it reaches a block delimiter.
+ * Note that the block delimiter includes the last literals of the block.
+ * @blockSize must be =3D=3D sum(sequence_lengths).
+ * @returns @blockSize on success, and a ZSTD_error otherwise.
  */
 static size_t
-ZSTD_copySequencesToSeqStoreExplicitBlockDelim(ZSTD_CCtx* cctx,
-                                              ZSTD_sequencePosition* seqPo=
s,
-                                        const ZSTD_Sequence* const inSeqs,=
 size_t inSeqsSize,
-                                        const void* src, size_t blockSize)
+ZSTD_transferSequences_wBlockDelim(ZSTD_CCtx* cctx,
+                                   ZSTD_SequencePosition* seqPos,
+                             const ZSTD_Sequence* const inSeqs, size_t inS=
eqsSize,
+                             const void* src, size_t blockSize,
+                                   ZSTD_ParamSwitch_e externalRepSearch)
 {
     U32 idx =3D seqPos->idx;
+    U32 const startIdx =3D idx;
     BYTE const* ip =3D (BYTE const*)(src);
     const BYTE* const iend =3D ip + blockSize;
-    repcodes_t updatedRepcodes;
+    Repcodes_t updatedRepcodes;
     U32 dictSize;
=20
+    DEBUGLOG(5, "ZSTD_transferSequences_wBlockDelim (blockSize =3D %zu)", =
blockSize);
+
     if (cctx->cdict) {
         dictSize =3D (U32)cctx->cdict->dictContentSize;
     } else if (cctx->prefixDict.dict) {
@@ -5616,27 +6455,60 @@ ZSTD_copySequencesToSeqStoreExplicitBlockDelim(ZSTD=
_CCtx* cctx,
     } else {
         dictSize =3D 0;
     }
-    ZSTD_memcpy(updatedRepcodes.rep, cctx->blockState.prevCBlock->rep, siz=
eof(repcodes_t));
-    for (; (inSeqs[idx].matchLength !=3D 0 || inSeqs[idx].offset !=3D 0) &=
& idx < inSeqsSize; ++idx) {
+    ZSTD_memcpy(updatedRepcodes.rep, cctx->blockState.prevCBlock->rep, siz=
eof(Repcodes_t));
+    for (; idx < inSeqsSize && (inSeqs[idx].matchLength !=3D 0 || inSeqs[i=
dx].offset !=3D 0); ++idx) {
         U32 const litLength =3D inSeqs[idx].litLength;
-        U32 const ll0 =3D (litLength =3D=3D 0);
         U32 const matchLength =3D inSeqs[idx].matchLength;
-        U32 const offCode =3D ZSTD_finalizeOffCode(inSeqs[idx].offset, upd=
atedRepcodes.rep, ll0);
-        ZSTD_updateRep(updatedRepcodes.rep, offCode, ll0);
+        U32 offBase;
+
+        if (externalRepSearch =3D=3D ZSTD_ps_disable) {
+            offBase =3D OFFSET_TO_OFFBASE(inSeqs[idx].offset);
+        } else {
+            U32 const ll0 =3D (litLength =3D=3D 0);
+            offBase =3D ZSTD_finalizeOffBase(inSeqs[idx].offset, updatedRe=
pcodes.rep, ll0);
+            ZSTD_updateRep(updatedRepcodes.rep, offBase, ll0);
+        }
=20
-        DEBUGLOG(6, "Storing sequence: (of: %u, ml: %u, ll: %u)", offCode,=
 matchLength, litLength);
+        DEBUGLOG(6, "Storing sequence: (of: %u, ml: %u, ll: %u)", offBase,=
 matchLength, litLength);
         if (cctx->appliedParams.validateSequences) {
             seqPos->posInSrc +=3D litLength + matchLength;
-            FORWARD_IF_ERROR(ZSTD_validateSequence(offCode, matchLength, s=
eqPos->posInSrc,
-                                                cctx->appliedParams.cParam=
s.windowLog, dictSize),
+            FORWARD_IF_ERROR(ZSTD_validateSequence(offBase, matchLength, c=
ctx->appliedParams.cParams.minMatch,
+                                                seqPos->posInSrc,
+                                                cctx->appliedParams.cParam=
s.windowLog, dictSize,
+                                                ZSTD_hasExtSeqProd(&cctx->=
appliedParams)),
                                                 "Sequence validation faile=
d");
         }
-        RETURN_ERROR_IF(idx - seqPos->idx > cctx->seqStore.maxNbSeq, memor=
y_allocation,
+        RETURN_ERROR_IF(idx - seqPos->idx >=3D cctx->seqStore.maxNbSeq, ex=
ternalSequences_invalid,
                         "Not enough memory allocated. Try adjusting ZSTD_c=
_minMatch.");
-        ZSTD_storeSeq(&cctx->seqStore, litLength, ip, iend, offCode, match=
Length);
+        ZSTD_storeSeq(&cctx->seqStore, litLength, ip, iend, offBase, match=
Length);
         ip +=3D matchLength + litLength;
     }
-    ZSTD_memcpy(cctx->blockState.nextCBlock->rep, updatedRepcodes.rep, siz=
eof(repcodes_t));
+    RETURN_ERROR_IF(idx =3D=3D inSeqsSize, externalSequences_invalid, "Blo=
ck delimiter not found.");
+
+    /* If we skipped repcode search while parsing, we need to update repco=
des now */
+    assert(externalRepSearch !=3D ZSTD_ps_auto);
+    assert(idx >=3D startIdx);
+    if (externalRepSearch =3D=3D ZSTD_ps_disable && idx !=3D startIdx) {
+        U32* const rep =3D updatedRepcodes.rep;
+        U32 lastSeqIdx =3D idx - 1; /* index of last non-block-delimiter s=
equence */
+
+        if (lastSeqIdx >=3D startIdx + 2) {
+            rep[2] =3D inSeqs[lastSeqIdx - 2].offset;
+            rep[1] =3D inSeqs[lastSeqIdx - 1].offset;
+            rep[0] =3D inSeqs[lastSeqIdx].offset;
+        } else if (lastSeqIdx =3D=3D startIdx + 1) {
+            rep[2] =3D rep[0];
+            rep[1] =3D inSeqs[lastSeqIdx - 1].offset;
+            rep[0] =3D inSeqs[lastSeqIdx].offset;
+        } else {
+            assert(lastSeqIdx =3D=3D startIdx);
+            rep[2] =3D rep[1];
+            rep[1] =3D rep[0];
+            rep[0] =3D inSeqs[lastSeqIdx].offset;
+        }
+    }
+
+    ZSTD_memcpy(cctx->blockState.nextCBlock->rep, updatedRepcodes.rep, siz=
eof(Repcodes_t));
=20
     if (inSeqs[idx].litLength) {
         DEBUGLOG(6, "Storing last literals of size: %u", inSeqs[idx].litLe=
ngth);
@@ -5644,37 +6516,43 @@ ZSTD_copySequencesToSeqStoreExplicitBlockDelim(ZSTD=
_CCtx* cctx,
         ip +=3D inSeqs[idx].litLength;
         seqPos->posInSrc +=3D inSeqs[idx].litLength;
     }
-    RETURN_ERROR_IF(ip !=3D iend, corruption_detected, "Blocksize doesn't =
agree with block delimiter!");
+    RETURN_ERROR_IF(ip !=3D iend, externalSequences_invalid, "Blocksize do=
esn't agree with block delimiter!");
     seqPos->idx =3D idx+1;
-    return 0;
+    return blockSize;
 }
=20
-/* Returns the number of bytes to move the current read position back by. =
Only non-zero
- * if we ended up splitting a sequence. Otherwise, it may return a ZSTD er=
ror if something
- * went wrong.
+/*
+ * This function attempts to scan through @blockSize bytes in @src
+ * represented by the sequences in @inSeqs,
+ * storing any (partial) sequences.
  *
- * This function will attempt to scan through blockSize bytes represented =
by the sequences
- * in inSeqs, storing any (partial) sequences.
+ * Occasionally, we may want to reduce the actual number of bytes consumed=
 from @src
+ * to avoid splitting a match, notably if it would produce a match smaller=
 than MINMATCH.
  *
- * Occasionally, we may want to change the actual number of bytes we consu=
med from inSeqs to
- * avoid splitting a match, or to avoid splitting a match such that it wou=
ld produce a match
- * smaller than MINMATCH. In this case, we return the number of bytes that=
 we didn't read from this block.
+ * @returns the number of bytes consumed from @src, necessarily <=3D @bloc=
kSize.
+ * Otherwise, it may return a ZSTD error if something went wrong.
  */
 static size_t
-ZSTD_copySequencesToSeqStoreNoBlockDelim(ZSTD_CCtx* cctx, ZSTD_sequencePos=
ition* seqPos,
-                                   const ZSTD_Sequence* const inSeqs, size=
_t inSeqsSize,
-                                   const void* src, size_t blockSize)
+ZSTD_transferSequences_noDelim(ZSTD_CCtx* cctx,
+                               ZSTD_SequencePosition* seqPos,
+                         const ZSTD_Sequence* const inSeqs, size_t inSeqsS=
ize,
+                         const void* src, size_t blockSize,
+                               ZSTD_ParamSwitch_e externalRepSearch)
 {
     U32 idx =3D seqPos->idx;
     U32 startPosInSequence =3D seqPos->posInSequence;
     U32 endPosInSequence =3D seqPos->posInSequence + (U32)blockSize;
     size_t dictSize;
-    BYTE const* ip =3D (BYTE const*)(src);
-    BYTE const* iend =3D ip + blockSize;  /* May be adjusted if we decide =
to process fewer than blockSize bytes */
-    repcodes_t updatedRepcodes;
+    const BYTE* const istart =3D (const BYTE*)(src);
+    const BYTE* ip =3D istart;
+    const BYTE* iend =3D istart + blockSize;  /* May be adjusted if we dec=
ide to process fewer than blockSize bytes */
+    Repcodes_t updatedRepcodes;
     U32 bytesAdjustment =3D 0;
     U32 finalMatchSplit =3D 0;
=20
+    /* TODO(embg) support fast parsing mode in noBlockDelim mode */
+    (void)externalRepSearch;
+
     if (cctx->cdict) {
         dictSize =3D cctx->cdict->dictContentSize;
     } else if (cctx->prefixDict.dict) {
@@ -5682,15 +6560,15 @@ ZSTD_copySequencesToSeqStoreNoBlockDelim(ZSTD_CCtx*=
 cctx, ZSTD_sequencePosition*
     } else {
         dictSize =3D 0;
     }
-    DEBUGLOG(5, "ZSTD_copySequencesToSeqStore: idx: %u PIS: %u blockSize: =
%zu", idx, startPosInSequence, blockSize);
+    DEBUGLOG(5, "ZSTD_transferSequences_noDelim: idx: %u PIS: %u blockSize=
: %zu", idx, startPosInSequence, blockSize);
     DEBUGLOG(5, "Start seq: idx: %u (of: %u ml: %u ll: %u)", idx, inSeqs[i=
dx].offset, inSeqs[idx].matchLength, inSeqs[idx].litLength);
-    ZSTD_memcpy(updatedRepcodes.rep, cctx->blockState.prevCBlock->rep, siz=
eof(repcodes_t));
+    ZSTD_memcpy(updatedRepcodes.rep, cctx->blockState.prevCBlock->rep, siz=
eof(Repcodes_t));
     while (endPosInSequence && idx < inSeqsSize && !finalMatchSplit) {
         const ZSTD_Sequence currSeq =3D inSeqs[idx];
         U32 litLength =3D currSeq.litLength;
         U32 matchLength =3D currSeq.matchLength;
         U32 const rawOffset =3D currSeq.offset;
-        U32 offCode;
+        U32 offBase;
=20
         /* Modify the sequence depending on where endPosInSequence lies */
         if (endPosInSequence >=3D currSeq.litLength + currSeq.matchLength)=
 {
@@ -5704,7 +6582,6 @@ ZSTD_copySequencesToSeqStoreNoBlockDelim(ZSTD_CCtx* c=
ctx, ZSTD_sequencePosition*
             /* Move to the next sequence */
             endPosInSequence -=3D currSeq.litLength + currSeq.matchLength;
             startPosInSequence =3D 0;
-            idx++;
         } else {
             /* This is the final (partial) sequence we're adding from inSe=
qs, and endPosInSequence
                does not reach the end of the match. So, we have to split t=
he sequence */
@@ -5744,58 +6621,113 @@ ZSTD_copySequencesToSeqStoreNoBlockDelim(ZSTD_CCtx=
* cctx, ZSTD_sequencePosition*
         }
         /* Check if this offset can be represented with a repcode */
         {   U32 const ll0 =3D (litLength =3D=3D 0);
-            offCode =3D ZSTD_finalizeOffCode(rawOffset, updatedRepcodes.re=
p, ll0);
-            ZSTD_updateRep(updatedRepcodes.rep, offCode, ll0);
+            offBase =3D ZSTD_finalizeOffBase(rawOffset, updatedRepcodes.re=
p, ll0);
+            ZSTD_updateRep(updatedRepcodes.rep, offBase, ll0);
         }
=20
         if (cctx->appliedParams.validateSequences) {
             seqPos->posInSrc +=3D litLength + matchLength;
-            FORWARD_IF_ERROR(ZSTD_validateSequence(offCode, matchLength, s=
eqPos->posInSrc,
-                                                   cctx->appliedParams.cPa=
rams.windowLog, dictSize),
+            FORWARD_IF_ERROR(ZSTD_validateSequence(offBase, matchLength, c=
ctx->appliedParams.cParams.minMatch, seqPos->posInSrc,
+                                                   cctx->appliedParams.cPa=
rams.windowLog, dictSize, ZSTD_hasExtSeqProd(&cctx->appliedParams)),
                                                    "Sequence validation fa=
iled");
         }
-        DEBUGLOG(6, "Storing sequence: (of: %u, ml: %u, ll: %u)", offCode,=
 matchLength, litLength);
-        RETURN_ERROR_IF(idx - seqPos->idx > cctx->seqStore.maxNbSeq, memor=
y_allocation,
+        DEBUGLOG(6, "Storing sequence: (of: %u, ml: %u, ll: %u)", offBase,=
 matchLength, litLength);
+        RETURN_ERROR_IF(idx - seqPos->idx >=3D cctx->seqStore.maxNbSeq, ex=
ternalSequences_invalid,
                         "Not enough memory allocated. Try adjusting ZSTD_c=
_minMatch.");
-        ZSTD_storeSeq(&cctx->seqStore, litLength, ip, iend, offCode, match=
Length);
+        ZSTD_storeSeq(&cctx->seqStore, litLength, ip, iend, offBase, match=
Length);
         ip +=3D matchLength + litLength;
+        if (!finalMatchSplit)
+            idx++; /* Next Sequence */
     }
     DEBUGLOG(5, "Ending seq: idx: %u (of: %u ml: %u ll: %u)", idx, inSeqs[=
idx].offset, inSeqs[idx].matchLength, inSeqs[idx].litLength);
     assert(idx =3D=3D inSeqsSize || endPosInSequence <=3D inSeqs[idx].litL=
ength + inSeqs[idx].matchLength);
     seqPos->idx =3D idx;
     seqPos->posInSequence =3D endPosInSequence;
-    ZSTD_memcpy(cctx->blockState.nextCBlock->rep, updatedRepcodes.rep, siz=
eof(repcodes_t));
+    ZSTD_memcpy(cctx->blockState.nextCBlock->rep, updatedRepcodes.rep, siz=
eof(Repcodes_t));
=20
     iend -=3D bytesAdjustment;
     if (ip !=3D iend) {
         /* Store any last literals */
-        U32 lastLLSize =3D (U32)(iend - ip);
+        U32 const lastLLSize =3D (U32)(iend - ip);
         assert(ip <=3D iend);
         DEBUGLOG(6, "Storing last literals of size: %u", lastLLSize);
         ZSTD_storeLastLiterals(&cctx->seqStore, ip, lastLLSize);
         seqPos->posInSrc +=3D lastLLSize;
     }
=20
-    return bytesAdjustment;
+    return (size_t)(iend-istart);
 }
=20
-typedef size_t (*ZSTD_sequenceCopier) (ZSTD_CCtx* cctx, ZSTD_sequencePosit=
ion* seqPos,
-                                       const ZSTD_Sequence* const inSeqs, =
size_t inSeqsSize,
-                                       const void* src, size_t blockSize);
-static ZSTD_sequenceCopier ZSTD_selectSequenceCopier(ZSTD_sequenceFormat_e=
 mode)
+/* @seqPos represents a position within @inSeqs,
+ * it is read and updated by this function,
+ * once the goal to produce a block of size @blockSize is reached.
+ * @return: nb of bytes consumed from @src, necessarily <=3D @blockSize.
+ */
+typedef size_t (*ZSTD_SequenceCopier_f)(ZSTD_CCtx* cctx,
+                                        ZSTD_SequencePosition* seqPos,
+                                  const ZSTD_Sequence* const inSeqs, size_=
t inSeqsSize,
+                                  const void* src, size_t blockSize,
+                                        ZSTD_ParamSwitch_e externalRepSear=
ch);
+
+static ZSTD_SequenceCopier_f ZSTD_selectSequenceCopier(ZSTD_SequenceFormat=
_e mode)
 {
-    ZSTD_sequenceCopier sequenceCopier =3D NULL;
-    assert(ZSTD_cParam_withinBounds(ZSTD_c_blockDelimiters, mode));
+    assert(ZSTD_cParam_withinBounds(ZSTD_c_blockDelimiters, (int)mode));
     if (mode =3D=3D ZSTD_sf_explicitBlockDelimiters) {
-        return ZSTD_copySequencesToSeqStoreExplicitBlockDelim;
-    } else if (mode =3D=3D ZSTD_sf_noBlockDelimiters) {
-        return ZSTD_copySequencesToSeqStoreNoBlockDelim;
+        return ZSTD_transferSequences_wBlockDelim;
+    }
+    assert(mode =3D=3D ZSTD_sf_noBlockDelimiters);
+    return ZSTD_transferSequences_noDelim;
+}
+
+/* Discover the size of next block by searching for the delimiter.
+ * Note that a block delimiter **must** exist in this mode,
+ * otherwise it's an input error.
+ * The block size retrieved will be later compared to ensure it remains wi=
thin bounds */
+static size_t
+blockSize_explicitDelimiter(const ZSTD_Sequence* inSeqs, size_t inSeqsSize=
, ZSTD_SequencePosition seqPos)
+{
+    int end =3D 0;
+    size_t blockSize =3D 0;
+    size_t spos =3D seqPos.idx;
+    DEBUGLOG(6, "blockSize_explicitDelimiter : seq %zu / %zu", spos, inSeq=
sSize);
+    assert(spos <=3D inSeqsSize);
+    while (spos < inSeqsSize) {
+        end =3D (inSeqs[spos].offset =3D=3D 0);
+        blockSize +=3D inSeqs[spos].litLength + inSeqs[spos].matchLength;
+        if (end) {
+            if (inSeqs[spos].matchLength !=3D 0)
+                RETURN_ERROR(externalSequences_invalid, "delimiter format =
error : both matchlength and offset must be =3D=3D 0");
+            break;
+        }
+        spos++;
     }
-    assert(sequenceCopier !=3D NULL);
-    return sequenceCopier;
+    if (!end)
+        RETURN_ERROR(externalSequences_invalid, "Reached end of sequences =
without finding a block delimiter");
+    return blockSize;
 }
=20
-/* Compress, block-by-block, all of the sequences given.
+static size_t determine_blockSize(ZSTD_SequenceFormat_e mode,
+                           size_t blockSize, size_t remaining,
+                     const ZSTD_Sequence* inSeqs, size_t inSeqsSize,
+                           ZSTD_SequencePosition seqPos)
+{
+    DEBUGLOG(6, "determine_blockSize : remainingSize =3D %zu", remaining);
+    if (mode =3D=3D ZSTD_sf_noBlockDelimiters) {
+        /* Note: more a "target" block size */
+        return MIN(remaining, blockSize);
+    }
+    assert(mode =3D=3D ZSTD_sf_explicitBlockDelimiters);
+    {   size_t const explicitBlockSize =3D blockSize_explicitDelimiter(inS=
eqs, inSeqsSize, seqPos);
+        FORWARD_IF_ERROR(explicitBlockSize, "Error while determining block=
 size with explicit delimiters");
+        if (explicitBlockSize > blockSize)
+            RETURN_ERROR(externalSequences_invalid, "sequences incorrectly=
 define a too large block");
+        if (explicitBlockSize > remaining)
+            RETURN_ERROR(externalSequences_invalid, "sequences define a fr=
ame longer than source");
+        return explicitBlockSize;
+    }
+}
+
+/* Compress all provided sequences, block-by-block.
  *
  * Returns the cumulative size of all compressed blocks (including their h=
eaders),
  * otherwise a ZSTD error.
@@ -5807,15 +6739,12 @@ ZSTD_compressSequences_internal(ZSTD_CCtx* cctx,
                           const void* src, size_t srcSize)
 {
     size_t cSize =3D 0;
-    U32 lastBlock;
-    size_t blockSize;
-    size_t compressedSeqsSize;
     size_t remaining =3D srcSize;
-    ZSTD_sequencePosition seqPos =3D {0, 0, 0};
+    ZSTD_SequencePosition seqPos =3D {0, 0, 0};
=20
-    BYTE const* ip =3D (BYTE const*)src;
+    const BYTE* ip =3D (BYTE const*)src;
     BYTE* op =3D (BYTE*)dst;
-    ZSTD_sequenceCopier const sequenceCopier =3D ZSTD_selectSequenceCopier=
(cctx->appliedParams.blockDelimiters);
+    ZSTD_SequenceCopier_f const sequenceCopier =3D ZSTD_selectSequenceCopi=
er(cctx->appliedParams.blockDelimiters);
=20
     DEBUGLOG(4, "ZSTD_compressSequences_internal srcSize: %zu, inSeqsSize:=
 %zu", srcSize, inSeqsSize);
     /* Special case: empty frame */
@@ -5829,22 +6758,29 @@ ZSTD_compressSequences_internal(ZSTD_CCtx* cctx,
     }
=20
     while (remaining) {
+        size_t compressedSeqsSize;
         size_t cBlockSize;
-        size_t additionalByteAdjustment;
-        lastBlock =3D remaining <=3D cctx->blockSize;
-        blockSize =3D lastBlock ? (U32)remaining : (U32)cctx->blockSize;
+        size_t blockSize =3D determine_blockSize(cctx->appliedParams.block=
Delimiters,
+                                        cctx->blockSizeMax, remaining,
+                                        inSeqs, inSeqsSize, seqPos);
+        U32 const lastBlock =3D (blockSize =3D=3D remaining);
+        FORWARD_IF_ERROR(blockSize, "Error while trying to determine block=
 size");
+        assert(blockSize <=3D remaining);
         ZSTD_resetSeqStore(&cctx->seqStore);
-        DEBUGLOG(4, "Working on new block. Blocksize: %zu", blockSize);
=20
-        additionalByteAdjustment =3D sequenceCopier(cctx, &seqPos, inSeqs,=
 inSeqsSize, ip, blockSize);
-        FORWARD_IF_ERROR(additionalByteAdjustment, "Bad sequence copy");
-        blockSize -=3D additionalByteAdjustment;
+        blockSize =3D sequenceCopier(cctx,
+                                   &seqPos, inSeqs, inSeqsSize,
+                                   ip, blockSize,
+                                   cctx->appliedParams.searchForExternalRe=
pcodes);
+        FORWARD_IF_ERROR(blockSize, "Bad sequence copy");
=20
         /* If blocks are too small, emit as a nocompress block */
-        if (blockSize < MIN_CBLOCK_SIZE+ZSTD_blockHeaderSize+1) {
+        /* TODO: See 3090. We reduced MIN_CBLOCK_SIZE from 3 to 2 so to co=
mpensate we are adding
+         * additional 1. We need to revisit and change this logic to be mo=
re consistent */
+        if (blockSize < MIN_CBLOCK_SIZE+ZSTD_blockHeaderSize+1+1) {
             cBlockSize =3D ZSTD_noCompressBlock(op, dstCapacity, ip, block=
Size, lastBlock);
             FORWARD_IF_ERROR(cBlockSize, "Nocompress block failed");
-            DEBUGLOG(4, "Block too small, writing out nocompress block: cS=
ize: %zu", cBlockSize);
+            DEBUGLOG(5, "Block too small (%zu): data remains uncompressed:=
 cSize=3D%zu", blockSize, cBlockSize);
             cSize +=3D cBlockSize;
             ip +=3D blockSize;
             op +=3D cBlockSize;
@@ -5853,35 +6789,36 @@ ZSTD_compressSequences_internal(ZSTD_CCtx* cctx,
             continue;
         }
=20
+        RETURN_ERROR_IF(dstCapacity < ZSTD_blockHeaderSize, dstSize_tooSma=
ll, "not enough dstCapacity to write a new compressed block");
         compressedSeqsSize =3D ZSTD_entropyCompressSeqStore(&cctx->seqStor=
e,
                                 &cctx->blockState.prevCBlock->entropy, &cc=
tx->blockState.nextCBlock->entropy,
                                 &cctx->appliedParams,
                                 op + ZSTD_blockHeaderSize /* Leave space f=
or block header */, dstCapacity - ZSTD_blockHeaderSize,
                                 blockSize,
-                                cctx->entropyWorkspace, ENTROPY_WORKSPACE_=
SIZE /* statically allocated in resetCCtx */,
+                                cctx->tmpWorkspace, cctx->tmpWkspSize /* s=
tatically allocated in resetCCtx */,
                                 cctx->bmi2);
         FORWARD_IF_ERROR(compressedSeqsSize, "Compressing sequences of blo=
ck failed");
-        DEBUGLOG(4, "Compressed sequences size: %zu", compressedSeqsSize);
+        DEBUGLOG(5, "Compressed sequences size: %zu", compressedSeqsSize);
=20
         if (!cctx->isFirstBlock &&
             ZSTD_maybeRLE(&cctx->seqStore) &&
-            ZSTD_isRLE((BYTE const*)src, srcSize)) {
-            /* We don't want to emit our first block as a RLE even if it q=
ualifies because
-            * doing so will cause the decoder (cli only) to throw a "shoul=
d consume all input error."
-            * This is only an issue for zstd <=3D v1.4.3
-            */
+            ZSTD_isRLE(ip, blockSize)) {
+            /* Note: don't emit the first block as RLE even if it qualifie=
s because
+             * doing so will cause the decoder (cli <=3D v1.4.3 only) to t=
hrow an (invalid) error
+             * "should consume all input error."
+             */
             compressedSeqsSize =3D 1;
         }
=20
         if (compressedSeqsSize =3D=3D 0) {
             /* ZSTD_noCompressBlock writes the block header as well */
             cBlockSize =3D ZSTD_noCompressBlock(op, dstCapacity, ip, block=
Size, lastBlock);
-            FORWARD_IF_ERROR(cBlockSize, "Nocompress block failed");
-            DEBUGLOG(4, "Writing out nocompress block, size: %zu", cBlockS=
ize);
+            FORWARD_IF_ERROR(cBlockSize, "ZSTD_noCompressBlock failed");
+            DEBUGLOG(5, "Writing out nocompress block, size: %zu", cBlockS=
ize);
         } else if (compressedSeqsSize =3D=3D 1) {
             cBlockSize =3D ZSTD_rleCompressBlock(op, dstCapacity, *ip, blo=
ckSize, lastBlock);
-            FORWARD_IF_ERROR(cBlockSize, "RLE compress block failed");
-            DEBUGLOG(4, "Writing out RLE block, size: %zu", cBlockSize);
+            FORWARD_IF_ERROR(cBlockSize, "ZSTD_rleCompressBlock failed");
+            DEBUGLOG(5, "Writing out RLE block, size: %zu", cBlockSize);
         } else {
             U32 cBlockHeader;
             /* Error checking and repcodes update */
@@ -5893,11 +6830,10 @@ ZSTD_compressSequences_internal(ZSTD_CCtx* cctx,
             cBlockHeader =3D lastBlock + (((U32)bt_compressed)<<1) + (U32)=
(compressedSeqsSize << 3);
             MEM_writeLE24(op, cBlockHeader);
             cBlockSize =3D ZSTD_blockHeaderSize + compressedSeqsSize;
-            DEBUGLOG(4, "Writing out compressed block, size: %zu", cBlockS=
ize);
+            DEBUGLOG(5, "Writing out compressed block, size: %zu", cBlockS=
ize);
         }
=20
         cSize +=3D cBlockSize;
-        DEBUGLOG(4, "cSize running total: %zu", cSize);
=20
         if (lastBlock) {
             break;
@@ -5908,41 +6844,50 @@ ZSTD_compressSequences_internal(ZSTD_CCtx* cctx,
             dstCapacity -=3D cBlockSize;
             cctx->isFirstBlock =3D 0;
         }
+        DEBUGLOG(5, "cSize running total: %zu (remaining dstCapacity=3D%zu=
)", cSize, dstCapacity);
     }
=20
+    DEBUGLOG(4, "cSize final total: %zu", cSize);
     return cSize;
 }
=20
-size_t ZSTD_compressSequences(ZSTD_CCtx* const cctx, void* dst, size_t dst=
Capacity,
+size_t ZSTD_compressSequences(ZSTD_CCtx* cctx,
+                              void* dst, size_t dstCapacity,
                               const ZSTD_Sequence* inSeqs, size_t inSeqsSi=
ze,
                               const void* src, size_t srcSize)
 {
     BYTE* op =3D (BYTE*)dst;
     size_t cSize =3D 0;
-    size_t compressedBlocksSize =3D 0;
-    size_t frameHeaderSize =3D 0;
=20
     /* Transparent initialization stage, same as compressStream2() */
-    DEBUGLOG(3, "ZSTD_compressSequences()");
+    DEBUGLOG(4, "ZSTD_compressSequences (nbSeqs=3D%zu,dstCapacity=3D%zu)",=
 inSeqsSize, dstCapacity);
     assert(cctx !=3D NULL);
     FORWARD_IF_ERROR(ZSTD_CCtx_init_compressStream2(cctx, ZSTD_e_end, srcS=
ize), "CCtx initialization failed");
+
     /* Begin writing output, starting with frame header */
-    frameHeaderSize =3D ZSTD_writeFrameHeader(op, dstCapacity, &cctx->appl=
iedParams, srcSize, cctx->dictID);
-    op +=3D frameHeaderSize;
-    dstCapacity -=3D frameHeaderSize;
-    cSize +=3D frameHeaderSize;
+    {   size_t const frameHeaderSize =3D ZSTD_writeFrameHeader(op, dstCapa=
city,
+                    &cctx->appliedParams, srcSize, cctx->dictID);
+        op +=3D frameHeaderSize;
+        assert(frameHeaderSize <=3D dstCapacity);
+        dstCapacity -=3D frameHeaderSize;
+        cSize +=3D frameHeaderSize;
+    }
     if (cctx->appliedParams.fParams.checksumFlag && srcSize) {
         xxh64_update(&cctx->xxhState, src, srcSize);
     }
-    /* cSize includes block header size and compressed sequences size */
-    compressedBlocksSize =3D ZSTD_compressSequences_internal(cctx,
+
+    /* Now generate compressed blocks */
+    {   size_t const cBlocksSize =3D ZSTD_compressSequences_internal(cctx,
                                                            op, dstCapacity,
                                                            inSeqs, inSeqsS=
ize,
                                                            src, srcSize);
-    FORWARD_IF_ERROR(compressedBlocksSize, "Compressing blocks failed!");
-    cSize +=3D compressedBlocksSize;
-    dstCapacity -=3D compressedBlocksSize;
+        FORWARD_IF_ERROR(cBlocksSize, "Compressing blocks failed!");
+        cSize +=3D cBlocksSize;
+        assert(cBlocksSize <=3D dstCapacity);
+        dstCapacity -=3D cBlocksSize;
+    }
=20
+    /* Complete with frame checksum, if needed */
     if (cctx->appliedParams.fParams.checksumFlag) {
         U32 const checksum =3D (U32) xxh64_digest(&cctx->xxhState);
         RETURN_ERROR_IF(dstCapacity<4, dstSize_tooSmall, "no room for chec=
ksum");
@@ -5951,26 +6896,557 @@ size_t ZSTD_compressSequences(ZSTD_CCtx* const cct=
x, void* dst, size_t dstCapaci
         cSize +=3D 4;
     }
=20
-    DEBUGLOG(3, "Final compressed size: %zu", cSize);
+    DEBUGLOG(4, "Final compressed size: %zu", cSize);
+    return cSize;
+}
+
+
+#if defined(__AVX2__)
+
+#include <immintrin.h>  /* AVX2 intrinsics */
+
+/*
+ * Convert 2 sequences per iteration, using AVX2 intrinsics:
+ *   - offset -> offBase =3D offset + 2
+ *   - litLength -> (U16) litLength
+ *   - matchLength -> (U16)(matchLength - 3)
+ *   - rep is ignored
+ * Store only 8 bytes per SeqDef (offBase[4], litLength[2], mlBase[2]).
+ *
+ * At the end, instead of extracting two __m128i,
+ * we use _mm256_permute4x64_epi64(..., 0xE8) to move lane2 into lane1,
+ * then store the lower 16 bytes in one go.
+ *
+ * @returns 0 on succes, with no long length detected
+ * @returns > 0 if there is one long length (> 65535),
+ * indicating the position, and type.
+ */
+static size_t convertSequences_noRepcodes(
+    SeqDef* dstSeqs,
+    const ZSTD_Sequence* inSeqs,
+    size_t nbSequences)
+{
+    /*
+     * addition:
+     *   For each 128-bit half: (offset+2, litLength+0, matchLength-3, rep=
+0)
+     */
+    const __m256i addition =3D _mm256_setr_epi32(
+        ZSTD_REP_NUM, 0, -MINMATCH, 0,    /* for sequence i */
+        ZSTD_REP_NUM, 0, -MINMATCH, 0     /* for sequence i+1 */
+    );
+
+    /* limit: check if there is a long length */
+    const __m256i limit =3D _mm256_set1_epi32(65535);
+
+    /*
+     * shuffle mask for byte-level rearrangement in each 128-bit half:
+     *
+     * Input layout (after addition) per 128-bit half:
+     *   [ offset+2 (4 bytes) | litLength (4 bytes) | matchLength (4 bytes=
) | rep (4 bytes) ]
+     * We only need:
+     *   offBase (4 bytes) =3D offset+2
+     *   litLength (2 bytes) =3D low 2 bytes of litLength
+     *   mlBase (2 bytes) =3D low 2 bytes of (matchLength)
+     * =3D> Bytes [0..3, 4..5, 8..9], zero the rest.
+     */
+    const __m256i mask =3D _mm256_setr_epi8(
+        /* For the lower 128 bits =3D> sequence i */
+         0, 1, 2, 3,       /* offset+2 */
+         4, 5,             /* litLength (16 bits) */
+         8, 9,             /* matchLength (16 bits) */
+         (BYTE)0x80, (BYTE)0x80, (BYTE)0x80, (BYTE)0x80,
+         (BYTE)0x80, (BYTE)0x80, (BYTE)0x80, (BYTE)0x80,
+
+        /* For the upper 128 bits =3D> sequence i+1 */
+        16,17,18,19,       /* offset+2 */
+        20,21,             /* litLength */
+        24,25,             /* matchLength */
+        (BYTE)0x80, (BYTE)0x80, (BYTE)0x80, (BYTE)0x80,
+        (BYTE)0x80, (BYTE)0x80, (BYTE)0x80, (BYTE)0x80
+    );
+
+    /*
+     * Next, we'll use _mm256_permute4x64_epi64(vshf, 0xE8).
+     * Explanation of 0xE8 =3D 11101000b =3D> [lane0, lane2, lane2, lane3].
+     * So the lower 128 bits become [lane0, lane2] =3D> combining seq0 and=
 seq1.
+     */
+#define PERM_LANE_0X_E8 0xE8  /* [0,2,2,3] in lane indices */
+
+    size_t longLen =3D 0, i =3D 0;
+
+    /* AVX permutation depends on the specific definition of target struct=
ures */
+    ZSTD_STATIC_ASSERT(sizeof(ZSTD_Sequence) =3D=3D 16);
+    ZSTD_STATIC_ASSERT(offsetof(ZSTD_Sequence, offset) =3D=3D 0);
+    ZSTD_STATIC_ASSERT(offsetof(ZSTD_Sequence, litLength) =3D=3D 4);
+    ZSTD_STATIC_ASSERT(offsetof(ZSTD_Sequence, matchLength) =3D=3D 8);
+    ZSTD_STATIC_ASSERT(sizeof(SeqDef) =3D=3D 8);
+    ZSTD_STATIC_ASSERT(offsetof(SeqDef, offBase) =3D=3D 0);
+    ZSTD_STATIC_ASSERT(offsetof(SeqDef, litLength) =3D=3D 4);
+    ZSTD_STATIC_ASSERT(offsetof(SeqDef, mlBase) =3D=3D 6);
+
+    /* Process 2 sequences per loop iteration */
+    for (; i + 1 < nbSequences; i +=3D 2) {
+        /* Load 2 ZSTD_Sequence (32 bytes) */
+        __m256i vin  =3D _mm256_loadu_si256((const __m256i*)(const void*)&=
inSeqs[i]);
+
+        /* Add {2, 0, -3, 0} in each 128-bit half */
+        __m256i vadd =3D _mm256_add_epi32(vin, addition);
+
+        /* Check for long length */
+        __m256i ll_cmp  =3D _mm256_cmpgt_epi32(vadd, limit);  /* 0xFFFFFFF=
F for element > 65535 */
+        int ll_res  =3D _mm256_movemask_epi8(ll_cmp);
+
+        /* Shuffle bytes so each half gives us the 8 bytes we need */
+        __m256i vshf =3D _mm256_shuffle_epi8(vadd, mask);
+        /*
+         * Now:
+         *   Lane0 =3D seq0's 8 bytes
+         *   Lane1 =3D 0
+         *   Lane2 =3D seq1's 8 bytes
+         *   Lane3 =3D 0
+         */
+
+        /* Permute 64-bit lanes =3D> move Lane2 down into Lane1. */
+        __m256i vperm =3D _mm256_permute4x64_epi64(vshf, PERM_LANE_0X_E8);
+        /*
+         * Now the lower 16 bytes (Lane0+Lane1) =3D [seq0, seq1].
+         * The upper 16 bytes are [Lane2, Lane3] =3D [seq1, 0], but we won=
't use them.
+         */
+
+        /* Store only the lower 16 bytes =3D> 2 SeqDef (8 bytes each) */
+        _mm_storeu_si128((__m128i *)(void*)&dstSeqs[i], _mm256_castsi256_s=
i128(vperm));
+        /*
+         * This writes out 16 bytes total:
+         *   - offset 0..7  =3D> seq0 (offBase, litLength, mlBase)
+         *   - offset 8..15 =3D> seq1 (offBase, litLength, mlBase)
+         */
+
+        /* check (unlikely) long lengths > 65535
+         * indices for lengths correspond to bits [4..7], [8..11], [20..23=
], [24..27]
+         * =3D> combined mask =3D 0x0FF00FF0
+         */
+        if (UNLIKELY((ll_res & 0x0FF00FF0) !=3D 0)) {
+            /* long length detected: let's figure out which one*/
+            if (inSeqs[i].matchLength > 65535+MINMATCH) {
+                assert(longLen =3D=3D 0);
+                longLen =3D i + 1;
+            }
+            if (inSeqs[i].litLength > 65535) {
+                assert(longLen =3D=3D 0);
+                longLen =3D i + nbSequences + 1;
+            }
+            if (inSeqs[i+1].matchLength > 65535+MINMATCH) {
+                assert(longLen =3D=3D 0);
+                longLen =3D i + 1 + 1;
+            }
+            if (inSeqs[i+1].litLength > 65535) {
+                assert(longLen =3D=3D 0);
+                longLen =3D i + 1 + nbSequences + 1;
+            }
+        }
+    }
+
+    /* Handle leftover if @nbSequences is odd */
+    if (i < nbSequences) {
+        /* process last sequence */
+        assert(i =3D=3D nbSequences - 1);
+        dstSeqs[i].offBase =3D OFFSET_TO_OFFBASE(inSeqs[i].offset);
+        dstSeqs[i].litLength =3D (U16)inSeqs[i].litLength;
+        dstSeqs[i].mlBase =3D (U16)(inSeqs[i].matchLength - MINMATCH);
+        /* check (unlikely) long lengths > 65535 */
+        if (UNLIKELY(inSeqs[i].matchLength > 65535+MINMATCH)) {
+            assert(longLen =3D=3D 0);
+            longLen =3D i + 1;
+        }
+        if (UNLIKELY(inSeqs[i].litLength > 65535)) {
+            assert(longLen =3D=3D 0);
+            longLen =3D i + nbSequences + 1;
+        }
+    }
+
+    return longLen;
+}
+
+/* the vector implementation could also be ported to SSSE3,
+ * but since this implementation is targeting modern systems (>=3D Sapphir=
e Rapid),
+ * it's not useful to develop and maintain code for older pre-AVX2 platfor=
ms */
+
+#else /* no AVX2 */
+
+static size_t convertSequences_noRepcodes(
+    SeqDef* dstSeqs,
+    const ZSTD_Sequence* inSeqs,
+    size_t nbSequences)
+{
+    size_t longLen =3D 0;
+    size_t n;
+    for (n=3D0; n<nbSequences; n++) {
+        dstSeqs[n].offBase =3D OFFSET_TO_OFFBASE(inSeqs[n].offset);
+        dstSeqs[n].litLength =3D (U16)inSeqs[n].litLength;
+        dstSeqs[n].mlBase =3D (U16)(inSeqs[n].matchLength - MINMATCH);
+        /* check for long length > 65535 */
+        if (UNLIKELY(inSeqs[n].matchLength > 65535+MINMATCH)) {
+            assert(longLen =3D=3D 0);
+            longLen =3D n + 1;
+        }
+        if (UNLIKELY(inSeqs[n].litLength > 65535)) {
+            assert(longLen =3D=3D 0);
+            longLen =3D n + nbSequences + 1;
+        }
+    }
+    return longLen;
+}
+
+#endif
+
+/*
+ * Precondition: Sequences must end on an explicit Block Delimiter
+ * @return: 0 on success, or an error code.
+ * Note: Sequence validation functionality has been disabled (removed).
+ * This is helpful to generate a lean main pipeline, improving performance.
+ * It may be re-inserted later.
+ */
+size_t ZSTD_convertBlockSequences(ZSTD_CCtx* cctx,
+                const ZSTD_Sequence* const inSeqs, size_t nbSequences,
+                int repcodeResolution)
+{
+    Repcodes_t updatedRepcodes;
+    size_t seqNb =3D 0;
+
+    DEBUGLOG(5, "ZSTD_convertBlockSequences (nbSequences =3D %zu)", nbSequ=
ences);
+
+    RETURN_ERROR_IF(nbSequences >=3D cctx->seqStore.maxNbSeq, externalSequ=
ences_invalid,
+                    "Not enough memory allocated. Try adjusting ZSTD_c_min=
Match.");
+
+    ZSTD_memcpy(updatedRepcodes.rep, cctx->blockState.prevCBlock->rep, siz=
eof(Repcodes_t));
+
+    /* check end condition */
+    assert(nbSequences >=3D 1);
+    assert(inSeqs[nbSequences-1].matchLength =3D=3D 0);
+    assert(inSeqs[nbSequences-1].offset =3D=3D 0);
+
+    /* Convert Sequences from public format to internal format */
+    if (!repcodeResolution) {
+        size_t const longl =3D convertSequences_noRepcodes(cctx->seqStore.=
sequencesStart, inSeqs, nbSequences-1);
+        cctx->seqStore.sequences =3D cctx->seqStore.sequencesStart + nbSeq=
uences-1;
+        if (longl) {
+            DEBUGLOG(5, "long length");
+            assert(cctx->seqStore.longLengthType =3D=3D ZSTD_llt_none);
+            if (longl <=3D nbSequences-1) {
+                DEBUGLOG(5, "long match length detected at pos %zu", longl=
-1);
+                cctx->seqStore.longLengthType =3D ZSTD_llt_matchLength;
+                cctx->seqStore.longLengthPos =3D (U32)(longl-1);
+            } else {
+                DEBUGLOG(5, "long literals length detected at pos %zu", lo=
ngl-nbSequences);
+                assert(longl <=3D 2* (nbSequences-1));
+                cctx->seqStore.longLengthType =3D ZSTD_llt_literalLength;
+                cctx->seqStore.longLengthPos =3D (U32)(longl-(nbSequences-=
1)-1);
+            }
+        }
+    } else {
+        for (seqNb =3D 0; seqNb < nbSequences - 1 ; seqNb++) {
+            U32 const litLength =3D inSeqs[seqNb].litLength;
+            U32 const matchLength =3D inSeqs[seqNb].matchLength;
+            U32 const ll0 =3D (litLength =3D=3D 0);
+            U32 const offBase =3D ZSTD_finalizeOffBase(inSeqs[seqNb].offse=
t, updatedRepcodes.rep, ll0);
+
+            DEBUGLOG(6, "Storing sequence: (of: %u, ml: %u, ll: %u)", offB=
ase, matchLength, litLength);
+            ZSTD_storeSeqOnly(&cctx->seqStore, litLength, offBase, matchLe=
ngth);
+            ZSTD_updateRep(updatedRepcodes.rep, offBase, ll0);
+        }
+    }
+
+    /* If we skipped repcode search while parsing, we need to update repco=
des now */
+    if (!repcodeResolution && nbSequences > 1) {
+        U32* const rep =3D updatedRepcodes.rep;
+
+        if (nbSequences >=3D 4) {
+            U32 lastSeqIdx =3D (U32)nbSequences - 2; /* index of last full=
 sequence */
+            rep[2] =3D inSeqs[lastSeqIdx - 2].offset;
+            rep[1] =3D inSeqs[lastSeqIdx - 1].offset;
+            rep[0] =3D inSeqs[lastSeqIdx].offset;
+        } else if (nbSequences =3D=3D 3) {
+            rep[2] =3D rep[0];
+            rep[1] =3D inSeqs[0].offset;
+            rep[0] =3D inSeqs[1].offset;
+        } else {
+            assert(nbSequences =3D=3D 2);
+            rep[2] =3D rep[1];
+            rep[1] =3D rep[0];
+            rep[0] =3D inSeqs[0].offset;
+        }
+    }
+
+    ZSTD_memcpy(cctx->blockState.nextCBlock->rep, updatedRepcodes.rep, siz=
eof(Repcodes_t));
+
+    return 0;
+}
+
+#if defined(ZSTD_ARCH_X86_AVX2)
+
+BlockSummary ZSTD_get1BlockSummary(const ZSTD_Sequence* seqs, size_t nbSeq=
s)
+{
+    size_t i;
+    __m256i const zeroVec =3D _mm256_setzero_si256();
+    __m256i sumVec =3D zeroVec;  /* accumulates match+lit in 32-bit lanes =
*/
+    ZSTD_ALIGNED(32) U32 tmp[8];      /* temporary buffer for reduction */
+    size_t mSum =3D 0, lSum =3D 0;
+    ZSTD_STATIC_ASSERT(sizeof(ZSTD_Sequence) =3D=3D 16);
+
+    /* Process 2 structs (32 bytes) at a time */
+    for (i =3D 0; i + 2 <=3D nbSeqs; i +=3D 2) {
+        /* Load two consecutive ZSTD_Sequence (8=C3=974 =3D 32 bytes) */
+        __m256i data     =3D _mm256_loadu_si256((const __m256i*)(const voi=
d*)&seqs[i]);
+        /* check end of block signal */
+        __m256i cmp      =3D _mm256_cmpeq_epi32(data, zeroVec);
+        int cmp_res      =3D _mm256_movemask_epi8(cmp);
+        /* indices for match lengths correspond to bits [8..11], [24..27]
+         * =3D> combined mask =3D 0x0F000F00 */
+        ZSTD_STATIC_ASSERT(offsetof(ZSTD_Sequence, matchLength) =3D=3D 8);
+        if (cmp_res & 0x0F000F00) break;
+        /* Accumulate in sumVec */
+        sumVec           =3D _mm256_add_epi32(sumVec, data);
+    }
+
+    /* Horizontal reduction */
+    _mm256_store_si256((__m256i*)tmp, sumVec);
+    lSum =3D tmp[1] + tmp[5];
+    mSum =3D tmp[2] + tmp[6];
+
+    /* Handle the leftover */
+    for (; i < nbSeqs; i++) {
+        lSum +=3D seqs[i].litLength;
+        mSum +=3D seqs[i].matchLength;
+        if (seqs[i].matchLength =3D=3D 0) break; /* end of block */
+    }
+
+    if (i=3D=3DnbSeqs) {
+        /* reaching end of sequences: end of block signal was not present =
*/
+        BlockSummary bs;
+        bs.nbSequences =3D ERROR(externalSequences_invalid);
+        return bs;
+    }
+    {   BlockSummary bs;
+        bs.nbSequences =3D i+1;
+        bs.blockSize =3D lSum + mSum;
+        bs.litSize =3D lSum;
+        return bs;
+    }
+}
+
+#else
+
+BlockSummary ZSTD_get1BlockSummary(const ZSTD_Sequence* seqs, size_t nbSeq=
s)
+{
+    size_t totalMatchSize =3D 0;
+    size_t litSize =3D 0;
+    size_t n;
+    assert(seqs);
+    for (n=3D0; n<nbSeqs; n++) {
+        totalMatchSize +=3D seqs[n].matchLength;
+        litSize +=3D seqs[n].litLength;
+        if (seqs[n].matchLength =3D=3D 0) {
+            assert(seqs[n].offset =3D=3D 0);
+            break;
+        }
+    }
+    if (n=3D=3DnbSeqs) {
+        BlockSummary bs;
+        bs.nbSequences =3D ERROR(externalSequences_invalid);
+        return bs;
+    }
+    {   BlockSummary bs;
+        bs.nbSequences =3D n+1;
+        bs.blockSize =3D litSize + totalMatchSize;
+        bs.litSize =3D litSize;
+        return bs;
+    }
+}
+#endif
+
+
+static size_t
+ZSTD_compressSequencesAndLiterals_internal(ZSTD_CCtx* cctx,
+                                void* dst, size_t dstCapacity,
+                          const ZSTD_Sequence* inSeqs, size_t nbSequences,
+                          const void* literals, size_t litSize, size_t src=
Size)
+{
+    size_t remaining =3D srcSize;
+    size_t cSize =3D 0;
+    BYTE* op =3D (BYTE*)dst;
+    int const repcodeResolution =3D (cctx->appliedParams.searchForExternal=
Repcodes =3D=3D ZSTD_ps_enable);
+    assert(cctx->appliedParams.searchForExternalRepcodes !=3D ZSTD_ps_auto=
);
+
+    DEBUGLOG(4, "ZSTD_compressSequencesAndLiterals_internal: nbSeqs=3D%zu,=
 litSize=3D%zu", nbSequences, litSize);
+    RETURN_ERROR_IF(nbSequences =3D=3D 0, externalSequences_invalid, "Requ=
ires at least 1 end-of-block");
+
+    /* Special case: empty frame */
+    if ((nbSequences =3D=3D 1) && (inSeqs[0].litLength =3D=3D 0)) {
+        U32 const cBlockHeader24 =3D 1 /* last block */ + (((U32)bt_raw)<<=
1);
+        RETURN_ERROR_IF(dstCapacity<3, dstSize_tooSmall, "No room for empt=
y frame block header");
+        MEM_writeLE24(op, cBlockHeader24);
+        op +=3D ZSTD_blockHeaderSize;
+        dstCapacity -=3D ZSTD_blockHeaderSize;
+        cSize +=3D ZSTD_blockHeaderSize;
+    }
+
+    while (nbSequences) {
+        size_t compressedSeqsSize, cBlockSize, conversionStatus;
+        BlockSummary const block =3D ZSTD_get1BlockSummary(inSeqs, nbSeque=
nces);
+        U32 const lastBlock =3D (block.nbSequences =3D=3D nbSequences);
+        FORWARD_IF_ERROR(block.nbSequences, "Error while trying to determi=
ne nb of sequences for a block");
+        assert(block.nbSequences <=3D nbSequences);
+        RETURN_ERROR_IF(block.litSize > litSize, externalSequences_invalid=
, "discrepancy: Sequences require more literals than present in buffer");
+        ZSTD_resetSeqStore(&cctx->seqStore);
+
+        conversionStatus =3D ZSTD_convertBlockSequences(cctx,
+                            inSeqs, block.nbSequences,
+                            repcodeResolution);
+        FORWARD_IF_ERROR(conversionStatus, "Bad sequence conversion");
+        inSeqs +=3D block.nbSequences;
+        nbSequences -=3D block.nbSequences;
+        remaining -=3D block.blockSize;
+
+        /* Note: when blockSize is very small, other variant send it uncom=
pressed.
+         * Here, we still send the sequences, because we don't have the or=
iginal source to send it uncompressed.
+         * One could imagine in theory reproducing the source from the seq=
uences,
+         * but that's complex and costly memory intensive, and goes agains=
t the objectives of this variant. */
+
+        RETURN_ERROR_IF(dstCapacity < ZSTD_blockHeaderSize, dstSize_tooSma=
ll, "not enough dstCapacity to write a new compressed block");
+
+        compressedSeqsSize =3D ZSTD_entropyCompressSeqStore_internal(
+                                op + ZSTD_blockHeaderSize /* Leave space f=
or block header */, dstCapacity - ZSTD_blockHeaderSize,
+                                literals, block.litSize,
+                                &cctx->seqStore,
+                                &cctx->blockState.prevCBlock->entropy, &cc=
tx->blockState.nextCBlock->entropy,
+                                &cctx->appliedParams,
+                                cctx->tmpWorkspace, cctx->tmpWkspSize /* s=
tatically allocated in resetCCtx */,
+                                cctx->bmi2);
+        FORWARD_IF_ERROR(compressedSeqsSize, "Compressing sequences of blo=
ck failed");
+        /* note: the spec forbids for any compressed block to be larger th=
an maximum block size */
+        if (compressedSeqsSize > cctx->blockSizeMax) compressedSeqsSize =
=3D 0;
+        DEBUGLOG(5, "Compressed sequences size: %zu", compressedSeqsSize);
+        litSize -=3D block.litSize;
+        literals =3D (const char*)literals + block.litSize;
+
+        /* Note: difficult to check source for RLE block when only Literal=
s are provided,
+         * but it could be considered from analyzing the sequence directly=
 */
+
+        if (compressedSeqsSize =3D=3D 0) {
+            /* Sending uncompressed blocks is out of reach, because the so=
urce is not provided.
+             * In theory, one could use the sequences to regenerate the so=
urce, like a decompressor,
+             * but it's complex, and memory hungry, killing the purpose of=
 this variant.
+             * Current outcome: generate an error code.
+             */
+            RETURN_ERROR(cannotProduce_uncompressedBlock, "ZSTD_compressSe=
quencesAndLiterals cannot generate an uncompressed block");
+        } else {
+            U32 cBlockHeader;
+            assert(compressedSeqsSize > 1); /* no RLE */
+            /* Error checking and repcodes update */
+            ZSTD_blockState_confirmRepcodesAndEntropyTables(&cctx->blockSt=
ate);
+            if (cctx->blockState.prevCBlock->entropy.fse.offcode_repeatMod=
e =3D=3D FSE_repeat_valid)
+                cctx->blockState.prevCBlock->entropy.fse.offcode_repeatMod=
e =3D FSE_repeat_check;
+
+            /* Write block header into beginning of block*/
+            cBlockHeader =3D lastBlock + (((U32)bt_compressed)<<1) + (U32)=
(compressedSeqsSize << 3);
+            MEM_writeLE24(op, cBlockHeader);
+            cBlockSize =3D ZSTD_blockHeaderSize + compressedSeqsSize;
+            DEBUGLOG(5, "Writing out compressed block, size: %zu", cBlockS=
ize);
+        }
+
+        cSize +=3D cBlockSize;
+        op +=3D cBlockSize;
+        dstCapacity -=3D cBlockSize;
+        cctx->isFirstBlock =3D 0;
+        DEBUGLOG(5, "cSize running total: %zu (remaining dstCapacity=3D%zu=
)", cSize, dstCapacity);
+
+        if (lastBlock) {
+            assert(nbSequences =3D=3D 0);
+            break;
+        }
+    }
+
+    RETURN_ERROR_IF(litSize !=3D 0, externalSequences_invalid, "literals m=
ust be entirely and exactly consumed");
+    RETURN_ERROR_IF(remaining !=3D 0, externalSequences_invalid, "Sequence=
s must represent a total of exactly srcSize=3D%zu", srcSize);
+    DEBUGLOG(4, "cSize final total: %zu", cSize);
+    return cSize;
+}
+
+size_t
+ZSTD_compressSequencesAndLiterals(ZSTD_CCtx* cctx,
+                    void* dst, size_t dstCapacity,
+                    const ZSTD_Sequence* inSeqs, size_t inSeqsSize,
+                    const void* literals, size_t litSize, size_t litCapaci=
ty,
+                    size_t decompressedSize)
+{
+    BYTE* op =3D (BYTE*)dst;
+    size_t cSize =3D 0;
+
+    /* Transparent initialization stage, same as compressStream2() */
+    DEBUGLOG(4, "ZSTD_compressSequencesAndLiterals (dstCapacity=3D%zu)", d=
stCapacity);
+    assert(cctx !=3D NULL);
+    if (litCapacity < litSize) {
+        RETURN_ERROR(workSpace_tooSmall, "literals buffer is not large eno=
ugh: must be at least 8 bytes larger than litSize (risk of read out-of-boun=
d)");
+    }
+    FORWARD_IF_ERROR(ZSTD_CCtx_init_compressStream2(cctx, ZSTD_e_end, deco=
mpressedSize), "CCtx initialization failed");
+
+    if (cctx->appliedParams.blockDelimiters =3D=3D ZSTD_sf_noBlockDelimite=
rs) {
+        RETURN_ERROR(frameParameter_unsupported, "This mode is only compat=
ible with explicit delimiters");
+    }
+    if (cctx->appliedParams.validateSequences) {
+        RETURN_ERROR(parameter_unsupported, "This mode is not compatible w=
ith Sequence validation");
+    }
+    if (cctx->appliedParams.fParams.checksumFlag) {
+        RETURN_ERROR(frameParameter_unsupported, "this mode is not compati=
ble with frame checksum");
+    }
+
+    /* Begin writing output, starting with frame header */
+    {   size_t const frameHeaderSize =3D ZSTD_writeFrameHeader(op, dstCapa=
city,
+                    &cctx->appliedParams, decompressedSize, cctx->dictID);
+        op +=3D frameHeaderSize;
+        assert(frameHeaderSize <=3D dstCapacity);
+        dstCapacity -=3D frameHeaderSize;
+        cSize +=3D frameHeaderSize;
+    }
+
+    /* Now generate compressed blocks */
+    {   size_t const cBlocksSize =3D ZSTD_compressSequencesAndLiterals_int=
ernal(cctx,
+                                            op, dstCapacity,
+                                            inSeqs, inSeqsSize,
+                                            literals, litSize, decompresse=
dSize);
+        FORWARD_IF_ERROR(cBlocksSize, "Compressing blocks failed!");
+        cSize +=3D cBlocksSize;
+        assert(cBlocksSize <=3D dstCapacity);
+        dstCapacity -=3D cBlocksSize;
+    }
+
+    DEBUGLOG(4, "Final compressed size: %zu", cSize);
     return cSize;
 }
=20
 /*=3D=3D=3D=3D=3D=3D   Finalize   =3D=3D=3D=3D=3D=3D*/
=20
+static ZSTD_inBuffer inBuffer_forEndFlush(const ZSTD_CStream* zcs)
+{
+    const ZSTD_inBuffer nullInput =3D { NULL, 0, 0 };
+    const int stableInput =3D (zcs->appliedParams.inBufferMode =3D=3D ZSTD=
_bm_stable);
+    return stableInput ? zcs->expectedInBuffer : nullInput;
+}
+
 /*! ZSTD_flushStream() :
  * @return : amount of data remaining to flush */
 size_t ZSTD_flushStream(ZSTD_CStream* zcs, ZSTD_outBuffer* output)
 {
-    ZSTD_inBuffer input =3D { NULL, 0, 0 };
+    ZSTD_inBuffer input =3D inBuffer_forEndFlush(zcs);
+    input.size =3D input.pos; /* do not ingest more input during flush */
     return ZSTD_compressStream2(zcs, output, &input, ZSTD_e_flush);
 }
=20
-
 size_t ZSTD_endStream(ZSTD_CStream* zcs, ZSTD_outBuffer* output)
 {
-    ZSTD_inBuffer input =3D { NULL, 0, 0 };
+    ZSTD_inBuffer input =3D inBuffer_forEndFlush(zcs);
     size_t const remainingToFlush =3D ZSTD_compressStream2(zcs, output, &i=
nput, ZSTD_e_end);
-    FORWARD_IF_ERROR( remainingToFlush , "ZSTD_compressStream2 failed");
+    FORWARD_IF_ERROR(remainingToFlush , "ZSTD_compressStream2(,,ZSTD_e_end=
) failed");
     if (zcs->appliedParams.nbWorkers > 0) return remainingToFlush;   /* mi=
nimal estimation */
     /* single thread mode : attempt to calculate remaining to flush more p=
recisely */
     {   size_t const lastBlockSize =3D zcs->frameEnded ? 0 : ZSTD_BLOCKHEA=
DERSIZE;
@@ -6046,7 +7522,7 @@ static void ZSTD_dedicatedDictSearch_revertCParams(
     }
 }
=20
-static U64 ZSTD_getCParamRowSize(U64 srcSizeHint, size_t dictSize, ZSTD_cP=
aramMode_e mode)
+static U64 ZSTD_getCParamRowSize(U64 srcSizeHint, size_t dictSize, ZSTD_CP=
aramMode_e mode)
 {
     switch (mode) {
     case ZSTD_cpm_unknown:
@@ -6070,8 +7546,8 @@ static U64 ZSTD_getCParamRowSize(U64 srcSizeHint, siz=
e_t dictSize, ZSTD_cParamMo
  * @return ZSTD_compressionParameters structure for a selected compression=
 level, srcSize and dictSize.
  *  Note: srcSizeHint 0 means 0, use ZSTD_CONTENTSIZE_UNKNOWN for unknown.
  *        Use dictSize =3D=3D 0 for unknown or unused.
- *  Note: `mode` controls how we treat the `dictSize`. See docs for `ZSTD_=
cParamMode_e`. */
-static ZSTD_compressionParameters ZSTD_getCParams_internal(int compression=
Level, unsigned long long srcSizeHint, size_t dictSize, ZSTD_cParamMode_e m=
ode)
+ *  Note: `mode` controls how we treat the `dictSize`. See docs for `ZSTD_=
CParamMode_e`. */
+static ZSTD_compressionParameters ZSTD_getCParams_internal(int compression=
Level, unsigned long long srcSizeHint, size_t dictSize, ZSTD_CParamMode_e m=
ode)
 {
     U64 const rSize =3D ZSTD_getCParamRowSize(srcSizeHint, dictSize, mode);
     U32 const tableID =3D (rSize <=3D 256 KB) + (rSize <=3D 128 KB) + (rSi=
ze <=3D 16 KB);
@@ -6092,7 +7568,7 @@ static ZSTD_compressionParameters ZSTD_getCParams_int=
ernal(int compressionLevel,
             cp.targetLength =3D (unsigned)(-clampedCompressionLevel);
         }
         /* refine parameters based on srcSize & dictSize */
-        return ZSTD_adjustCParams_internal(cp, srcSizeHint, dictSize, mode=
);
+        return ZSTD_adjustCParams_internal(cp, srcSizeHint, dictSize, mode=
, ZSTD_ps_auto);
     }
 }
=20
@@ -6109,7 +7585,9 @@ ZSTD_compressionParameters ZSTD_getCParams(int compre=
ssionLevel, unsigned long l
  *  same idea as ZSTD_getCParams()
  * @return a `ZSTD_parameters` structure (instead of `ZSTD_compressionPara=
meters`).
  *  Fields of `ZSTD_frameParameters` are set to default values */
-static ZSTD_parameters ZSTD_getParams_internal(int compressionLevel, unsig=
ned long long srcSizeHint, size_t dictSize, ZSTD_cParamMode_e mode) {
+static ZSTD_parameters
+ZSTD_getParams_internal(int compressionLevel, unsigned long long srcSizeHi=
nt, size_t dictSize, ZSTD_CParamMode_e mode)
+{
     ZSTD_parameters params;
     ZSTD_compressionParameters const cParams =3D ZSTD_getCParams_internal(=
compressionLevel, srcSizeHint, dictSize, mode);
     DEBUGLOG(5, "ZSTD_getParams (cLevel=3D%i)", compressionLevel);
@@ -6123,7 +7601,34 @@ static ZSTD_parameters ZSTD_getParams_internal(int c=
ompressionLevel, unsigned lo
  *  same idea as ZSTD_getCParams()
  * @return a `ZSTD_parameters` structure (instead of `ZSTD_compressionPara=
meters`).
  *  Fields of `ZSTD_frameParameters` are set to default values */
-ZSTD_parameters ZSTD_getParams(int compressionLevel, unsigned long long sr=
cSizeHint, size_t dictSize) {
+ZSTD_parameters ZSTD_getParams(int compressionLevel, unsigned long long sr=
cSizeHint, size_t dictSize)
+{
     if (srcSizeHint =3D=3D 0) srcSizeHint =3D ZSTD_CONTENTSIZE_UNKNOWN;
     return ZSTD_getParams_internal(compressionLevel, srcSizeHint, dictSize=
, ZSTD_cpm_unknown);
 }
+
+void ZSTD_registerSequenceProducer(
+    ZSTD_CCtx* zc,
+    void* extSeqProdState,
+    ZSTD_sequenceProducer_F extSeqProdFunc)
+{
+    assert(zc !=3D NULL);
+    ZSTD_CCtxParams_registerSequenceProducer(
+        &zc->requestedParams, extSeqProdState, extSeqProdFunc
+    );
+}
+
+void ZSTD_CCtxParams_registerSequenceProducer(
+  ZSTD_CCtx_params* params,
+  void* extSeqProdState,
+  ZSTD_sequenceProducer_F extSeqProdFunc)
+{
+    assert(params !=3D NULL);
+    if (extSeqProdFunc !=3D NULL) {
+        params->extSeqProdFunc =3D extSeqProdFunc;
+        params->extSeqProdState =3D extSeqProdState;
+    } else {
+        params->extSeqProdFunc =3D NULL;
+        params->extSeqProdState =3D NULL;
+    }
+}
diff --git a/lib/zstd/compress/zstd_compress_internal.h b/lib/zstd/compress=
/zstd_compress_internal.h
index 71697a11ae30..b10978385876 100644
--- a/lib/zstd/compress/zstd_compress_internal.h
+++ b/lib/zstd/compress/zstd_compress_internal.h
@@ -1,5 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -20,7 +21,8 @@
 ***************************************/
 #include "../common/zstd_internal.h"
 #include "zstd_cwksp.h"
-
+#include "../common/bits.h" /* ZSTD_highbit32, ZSTD_NbCommonBytes */
+#include "zstd_preSplit.h" /* ZSTD_SLIPBLOCK_WORKSPACESIZE */
=20
 /*-*************************************
 *  Constants
@@ -32,7 +34,7 @@
                                        It's not a big deal though : candid=
ate will just be sorted again.
                                        Additionally, candidate position 1 =
will be lost.
                                        But candidate 1 cannot hide a large=
 tree of candidates, so it's a minimal loss.
-                                       The benefit is that ZSTD_DUBT_UNSOR=
TED_MARK cannot be mishandled after table re-use with a different strategy.
+                                       The benefit is that ZSTD_DUBT_UNSOR=
TED_MARK cannot be mishandled after table reuse with a different strategy.
                                        This constant is required by ZSTD_c=
ompressBlock_btlazy2() and ZSTD_reduceTable_internal() */
=20
=20
@@ -75,6 +77,70 @@ typedef struct {
     ZSTD_fseCTables_t fse;
 } ZSTD_entropyCTables_t;
=20
+/* *********************************************
+*  Sequences *
+***********************************************/
+typedef struct SeqDef_s {
+    U32 offBase;   /* offBase =3D=3D Offset + ZSTD_REP_NUM, or repcode 1,2=
,3 */
+    U16 litLength;
+    U16 mlBase;    /* mlBase =3D=3D matchLength - MINMATCH */
+} SeqDef;
+
+/* Controls whether seqStore has a single "long" litLength or matchLength.=
 See SeqStore_t. */
+typedef enum {
+    ZSTD_llt_none =3D 0,             /* no longLengthType */
+    ZSTD_llt_literalLength =3D 1,    /* represents a long literal */
+    ZSTD_llt_matchLength =3D 2       /* represents a long match */
+} ZSTD_longLengthType_e;
+
+typedef struct {
+    SeqDef* sequencesStart;
+    SeqDef* sequences;      /* ptr to end of sequences */
+    BYTE*  litStart;
+    BYTE*  lit;             /* ptr to end of literals */
+    BYTE*  llCode;
+    BYTE*  mlCode;
+    BYTE*  ofCode;
+    size_t maxNbSeq;
+    size_t maxNbLit;
+
+    /* longLengthPos and longLengthType to allow us to represent either a =
single litLength or matchLength
+     * in the seqStore that has a value larger than U16 (if it exists). To=
 do so, we increment
+     * the existing value of the litLength or matchLength by 0x10000.
+     */
+    ZSTD_longLengthType_e longLengthType;
+    U32                   longLengthPos;  /* Index of the sequence to appl=
y long length modification to */
+} SeqStore_t;
+
+typedef struct {
+    U32 litLength;
+    U32 matchLength;
+} ZSTD_SequenceLength;
+
+/*
+ * Returns the ZSTD_SequenceLength for the given sequences. It handles the=
 decoding of long sequences
+ * indicated by longLengthPos and longLengthType, and adds MINMATCH back t=
o matchLength.
+ */
+MEM_STATIC ZSTD_SequenceLength ZSTD_getSequenceLength(SeqStore_t const* se=
qStore, SeqDef const* seq)
+{
+    ZSTD_SequenceLength seqLen;
+    seqLen.litLength =3D seq->litLength;
+    seqLen.matchLength =3D seq->mlBase + MINMATCH;
+    if (seqStore->longLengthPos =3D=3D (U32)(seq - seqStore->sequencesStar=
t)) {
+        if (seqStore->longLengthType =3D=3D ZSTD_llt_literalLength) {
+            seqLen.litLength +=3D 0x10000;
+        }
+        if (seqStore->longLengthType =3D=3D ZSTD_llt_matchLength) {
+            seqLen.matchLength +=3D 0x10000;
+        }
+    }
+    return seqLen;
+}
+
+const SeqStore_t* ZSTD_getSeqStore(const ZSTD_CCtx* ctx);   /* compress & =
dictBuilder */
+int ZSTD_seqToCodes(const SeqStore_t* seqStorePtr);   /* compress, dictBui=
lder, decodeCorpus (shouldn't get its definition from here) */
+
+
 /* *********************************************
 *  Entropy buffer statistics structs and funcs *
 ***********************************************/
@@ -84,7 +150,7 @@ typedef struct {
  *  hufDesSize refers to the size of huffman tree description in bytes.
  *  This metadata is populated in ZSTD_buildBlockEntropyStats_literals() */
 typedef struct {
-    symbolEncodingType_e hType;
+    SymbolEncodingType_e hType;
     BYTE hufDesBuffer[ZSTD_MAX_HUF_HEADER_SIZE];
     size_t hufDesSize;
 } ZSTD_hufCTablesMetadata_t;
@@ -95,9 +161,9 @@ typedef struct {
  *  fseTablesSize refers to the size of fse tables in bytes.
  *  This metadata is populated in ZSTD_buildBlockEntropyStats_sequences() =
*/
 typedef struct {
-    symbolEncodingType_e llType;
-    symbolEncodingType_e ofType;
-    symbolEncodingType_e mlType;
+    SymbolEncodingType_e llType;
+    SymbolEncodingType_e ofType;
+    SymbolEncodingType_e mlType;
     BYTE fseTablesBuffer[ZSTD_MAX_FSE_HEADERS_SIZE];
     size_t fseTablesSize;
     size_t lastCountSize; /* This is to account for bug in 1.3.4. More det=
ail in ZSTD_entropyCompressSeqStore_internal() */
@@ -111,12 +177,13 @@ typedef struct {
 /* ZSTD_buildBlockEntropyStats() :
  *  Builds entropy for the block.
  *  @return : 0 on success or error code */
-size_t ZSTD_buildBlockEntropyStats(seqStore_t* seqStorePtr,
-                             const ZSTD_entropyCTables_t* prevEntropy,
-                                   ZSTD_entropyCTables_t* nextEntropy,
-                             const ZSTD_CCtx_params* cctxParams,
-                                   ZSTD_entropyCTablesMetadata_t* entropyM=
etadata,
-                                   void* workspace, size_t wkspSize);
+size_t ZSTD_buildBlockEntropyStats(
+                    const SeqStore_t* seqStorePtr,
+                    const ZSTD_entropyCTables_t* prevEntropy,
+                          ZSTD_entropyCTables_t* nextEntropy,
+                    const ZSTD_CCtx_params* cctxParams,
+                          ZSTD_entropyCTablesMetadata_t* entropyMetadata,
+                          void* workspace, size_t wkspSize);
=20
 /* *******************************
 *  Compression internals structs *
@@ -140,28 +207,29 @@ typedef struct {
                            stopped. posInSequence <=3D seq[pos].litLength =
+ seq[pos].matchLength */
   size_t size;          /* The number of sequences. <=3D capacity. */
   size_t capacity;      /* The capacity starting from `seq` pointer */
-} rawSeqStore_t;
+} RawSeqStore_t;
=20
-UNUSED_ATTR static const rawSeqStore_t kNullRawSeqStore =3D {NULL, 0, 0, 0=
, 0};
+UNUSED_ATTR static const RawSeqStore_t kNullRawSeqStore =3D {NULL, 0, 0, 0=
, 0};
=20
 typedef struct {
-    int price;
-    U32 off;
-    U32 mlen;
-    U32 litlen;
-    U32 rep[ZSTD_REP_NUM];
+    int price;  /* price from beginning of segment to this position */
+    U32 off;    /* offset of previous match */
+    U32 mlen;   /* length of previous match */
+    U32 litlen; /* nb of literals since previous match */
+    U32 rep[ZSTD_REP_NUM];  /* offset history after previous match */
 } ZSTD_optimal_t;
=20
 typedef enum { zop_dynamic=3D0, zop_predef } ZSTD_OptPrice_e;
=20
+#define ZSTD_OPT_SIZE (ZSTD_OPT_NUM+3)
 typedef struct {
     /* All tables are allocated inside cctx->workspace by ZSTD_resetCCtx_i=
nternal() */
     unsigned* litFreq;           /* table of literals statistics, of size =
256 */
     unsigned* litLengthFreq;     /* table of litLength statistics, of size=
 (MaxLL+1) */
     unsigned* matchLengthFreq;   /* table of matchLength statistics, of si=
ze (MaxML+1) */
     unsigned* offCodeFreq;       /* table of offCode statistics, of size (=
MaxOff+1) */
-    ZSTD_match_t* matchTable;    /* list of found matches, of size ZSTD_OP=
T_NUM+1 */
-    ZSTD_optimal_t* priceTable;  /* All positions tracked by optimal parse=
r, of size ZSTD_OPT_NUM+1 */
+    ZSTD_match_t* matchTable;    /* list of found matches, of size ZSTD_OP=
T_SIZE */
+    ZSTD_optimal_t* priceTable;  /* All positions tracked by optimal parse=
r, of size ZSTD_OPT_SIZE */
=20
     U32  litSum;                 /* nb of literals */
     U32  litLengthSum;           /* nb of litLength codes */
@@ -173,7 +241,7 @@ typedef struct {
     U32  offCodeSumBasePrice;    /* to compare to log2(offreq)  */
     ZSTD_OptPrice_e priceType;   /* prices can be determined dynamically, =
or follow a pre-defined cost structure */
     const ZSTD_entropyCTables_t* symbolCosts;  /* pre-calculated dictionar=
y statistics */
-    ZSTD_paramSwitch_e literalCompressionMode;
+    ZSTD_ParamSwitch_e literalCompressionMode;
 } optState_t;
=20
 typedef struct {
@@ -195,11 +263,11 @@ typedef struct {
=20
 #define ZSTD_WINDOW_START_INDEX 2
=20
-typedef struct ZSTD_matchState_t ZSTD_matchState_t;
+typedef struct ZSTD_MatchState_t ZSTD_MatchState_t;
=20
 #define ZSTD_ROW_HASH_CACHE_SIZE 8       /* Size of prefetching hash cache=
 for row-based matchfinder */
=20
-struct ZSTD_matchState_t {
+struct ZSTD_MatchState_t {
     ZSTD_window_t window;   /* State for window round buffer management */
     U32 loadedDictEnd;      /* index of end of dictionary, within context'=
s referential.
                              * When loadedDictEnd !=3D 0, a dictionary is =
in use, and still valid.
@@ -212,28 +280,42 @@ struct ZSTD_matchState_t {
     U32 hashLog3;           /* dispatch table for matches of len=3D=3D3 : =
larger =3D=3D faster, more memory */
=20
     U32 rowHashLog;                          /* For row-based matchfinder:=
 Hashlog based on nb of rows in the hashTable.*/
-    U16* tagTable;                           /* For row-based matchFinder:=
 A row-based table containing the hashes and head index. */
+    BYTE* tagTable;                          /* For row-based matchFinder:=
 A row-based table containing the hashes and head index. */
     U32 hashCache[ZSTD_ROW_HASH_CACHE_SIZE]; /* For row-based matchFinder:=
 a cache of hashes to improve speed */
+    U64 hashSalt;                            /* For row-based matchFinder:=
 salts the hash for reuse of tag table */
+    U32 hashSaltEntropy;                     /* For row-based matchFinder:=
 collects entropy for salt generation */
=20
     U32* hashTable;
     U32* hashTable3;
     U32* chainTable;
=20
-    U32 forceNonContiguous; /* Non-zero if we should force non-contiguous =
load for the next window update. */
+    int forceNonContiguous; /* Non-zero if we should force non-contiguous =
load for the next window update. */
=20
     int dedicatedDictSearch;  /* Indicates whether this matchState is usin=
g the
                                * dedicated dictionary search structure.
                                */
     optState_t opt;         /* optimal parser state */
-    const ZSTD_matchState_t* dictMatchState;
+    const ZSTD_MatchState_t* dictMatchState;
     ZSTD_compressionParameters cParams;
-    const rawSeqStore_t* ldmSeqStore;
+    const RawSeqStore_t* ldmSeqStore;
+
+    /* Controls prefetching in some dictMatchState matchfinders.
+     * This behavior is controlled from the cctx ms.
+     * This parameter has no effect in the cdict ms. */
+    int prefetchCDictTables;
+
+    /* When =3D=3D 0, lazy match finders insert every position.
+     * When !=3D 0, lazy match finders only insert positions they search.
+     * This allows them to skip much faster over incompressible data,
+     * at a small cost to compression ratio.
+     */
+    int lazySkipping;
 };
=20
 typedef struct {
     ZSTD_compressedBlockState_t* prevCBlock;
     ZSTD_compressedBlockState_t* nextCBlock;
-    ZSTD_matchState_t matchState;
+    ZSTD_MatchState_t matchState;
 } ZSTD_blockState_t;
=20
 typedef struct {
@@ -260,7 +342,7 @@ typedef struct {
 } ldmState_t;
=20
 typedef struct {
-    ZSTD_paramSwitch_e enableLdm; /* ZSTD_ps_enable to enable LDM. ZSTD_ps=
_auto by default */
+    ZSTD_ParamSwitch_e enableLdm; /* ZSTD_ps_enable to enable LDM. ZSTD_ps=
_auto by default */
     U32 hashLog;            /* Log size of hashTable */
     U32 bucketSizeLog;      /* Log bucket size for collision resolution, a=
t most 8 */
     U32 minMatchLength;     /* Minimum match length */
@@ -291,7 +373,7 @@ struct ZSTD_CCtx_params_s {
                                 * There is no guarantee that hint is close=
 to actual source size */
=20
     ZSTD_dictAttachPref_e attachDictPref;
-    ZSTD_paramSwitch_e literalCompressionMode;
+    ZSTD_ParamSwitch_e literalCompressionMode;
=20
     /* Multithreading: used to pass parameters to mtctx */
     int nbWorkers;
@@ -310,24 +392,54 @@ struct ZSTD_CCtx_params_s {
     ZSTD_bufferMode_e outBufferMode;
=20
     /* Sequence compression API */
-    ZSTD_sequenceFormat_e blockDelimiters;
+    ZSTD_SequenceFormat_e blockDelimiters;
     int validateSequences;
=20
-    /* Block splitting */
-    ZSTD_paramSwitch_e useBlockSplitter;
+    /* Block splitting
+     * @postBlockSplitter executes split analysis after sequences are prod=
uced,
+     * it's more accurate but consumes more resources.
+     * @preBlockSplitter_level splits before knowing sequences,
+     * it's more approximative but also cheaper.
+     * Valid @preBlockSplitter_level values range from 0 to 6 (included).
+     * 0 means auto, 1 means do not split,
+     * then levels are sorted in increasing cpu budget, from 2 (fastest) t=
o 6 (slowest).
+     * Highest @preBlockSplitter_level combines well with @postBlockSplitt=
er.
+     */
+    ZSTD_ParamSwitch_e postBlockSplitter;
+    int preBlockSplitter_level;
+
+    /* Adjust the max block size*/
+    size_t maxBlockSize;
=20
     /* Param for deciding whether to use row-based matchfinder */
-    ZSTD_paramSwitch_e useRowMatchFinder;
+    ZSTD_ParamSwitch_e useRowMatchFinder;
=20
     /* Always load a dictionary in ext-dict mode (not prefix mode)? */
     int deterministicRefPrefix;
=20
     /* Internal use, for createCCtxParams() and freeCCtxParams() only */
     ZSTD_customMem customMem;
+
+    /* Controls prefetching in some dictMatchState matchfinders */
+    ZSTD_ParamSwitch_e prefetchCDictTables;
+
+    /* Controls whether zstd will fall back to an internal matchfinder
+     * if the external matchfinder returns an error code. */
+    int enableMatchFinderFallback;
+
+    /* Parameters for the external sequence producer API.
+     * Users set these parameters through ZSTD_registerSequenceProducer().
+     * It is not possible to set these parameters individually through the=
 public API. */
+    void* extSeqProdState;
+    ZSTD_sequenceProducer_F extSeqProdFunc;
+
+    /* Controls repcode search in external sequence parsing */
+    ZSTD_ParamSwitch_e searchForExternalRepcodes;
 };  /* typedef'd to ZSTD_CCtx_params within "zstd.h" */
=20
 #define COMPRESS_SEQUENCES_WORKSPACE_SIZE (sizeof(unsigned) * (MaxSeq + 2))
 #define ENTROPY_WORKSPACE_SIZE (HUF_WORKSPACE_SIZE + COMPRESS_SEQUENCES_WO=
RKSPACE_SIZE)
+#define TMP_WORKSPACE_SIZE (MAX(ENTROPY_WORKSPACE_SIZE, ZSTD_SLIPBLOCK_WOR=
KSPACESIZE))
=20
 /*
  * Indicates whether this compression proceeds directly from user-provided
@@ -345,11 +457,11 @@ typedef enum {
  */
 #define ZSTD_MAX_NB_BLOCK_SPLITS 196
 typedef struct {
-    seqStore_t fullSeqStoreChunk;
-    seqStore_t firstHalfSeqStore;
-    seqStore_t secondHalfSeqStore;
-    seqStore_t currSeqStore;
-    seqStore_t nextSeqStore;
+    SeqStore_t fullSeqStoreChunk;
+    SeqStore_t firstHalfSeqStore;
+    SeqStore_t secondHalfSeqStore;
+    SeqStore_t currSeqStore;
+    SeqStore_t nextSeqStore;
=20
     U32 partitions[ZSTD_MAX_NB_BLOCK_SPLITS];
     ZSTD_entropyCTablesMetadata_t entropyMetadata;
@@ -366,7 +478,7 @@ struct ZSTD_CCtx_s {
     size_t dictContentSize;
=20
     ZSTD_cwksp workspace; /* manages buffer for dynamic allocations */
-    size_t blockSize;
+    size_t blockSizeMax;
     unsigned long long pledgedSrcSizePlusOne;  /* this way, 0 (default) =
=3D=3D unknown */
     unsigned long long consumedSrcSize;
     unsigned long long producedCSize;
@@ -378,13 +490,14 @@ struct ZSTD_CCtx_s {
     int isFirstBlock;
     int initialized;
=20
-    seqStore_t seqStore;      /* sequences storage ptrs */
+    SeqStore_t seqStore;      /* sequences storage ptrs */
     ldmState_t ldmState;      /* long distance matching state */
     rawSeq* ldmSequences;     /* Storage for the ldm output sequences */
     size_t maxNbLdmSequences;
-    rawSeqStore_t externSeqStore; /* Mutable reference to external sequenc=
es */
+    RawSeqStore_t externSeqStore; /* Mutable reference to external sequenc=
es */
     ZSTD_blockState_t blockState;
-    U32* entropyWorkspace;  /* entropy workspace of ENTROPY_WORKSPACE_SIZE=
 bytes */
+    void* tmpWorkspace;  /* used as substitute of stack space - must be al=
igned for S64 type */
+    size_t tmpWkspSize;
=20
     /* Whether we are streaming or not */
     ZSTD_buffered_policy_e bufferedPolicy;
@@ -404,6 +517,7 @@ struct ZSTD_CCtx_s {
=20
     /* Stable in/out buffer verification */
     ZSTD_inBuffer expectedInBuffer;
+    size_t stableIn_notConsumed; /* nb bytes within stable input buffer th=
at are said to be consumed but are not */
     size_t expectedOutBufferSize;
=20
     /* Dictionary */
@@ -417,9 +531,14 @@ struct ZSTD_CCtx_s {
=20
     /* Workspace for block splitter */
     ZSTD_blockSplitCtx blockSplitCtx;
+
+    /* Buffer for output from external sequence producer */
+    ZSTD_Sequence* extSeqBuf;
+    size_t extSeqBufCapacity;
 };
=20
 typedef enum { ZSTD_dtlm_fast, ZSTD_dtlm_full } ZSTD_dictTableLoadMethod_e;
+typedef enum { ZSTD_tfp_forCCtx, ZSTD_tfp_forCDict } ZSTD_tableFillPurpose=
_e;
=20
 typedef enum {
     ZSTD_noDict =3D 0,
@@ -441,17 +560,17 @@ typedef enum {
                                  * In this mode we take both the source si=
ze and the dictionary size
                                  * into account when selecting and adjusti=
ng the parameters.
                                  */
-    ZSTD_cpm_unknown =3D 3,       /* ZSTD_getCParams, ZSTD_getParams, ZSTD=
_adjustParams.
+    ZSTD_cpm_unknown =3D 3        /* ZSTD_getCParams, ZSTD_getParams, ZSTD=
_adjustParams.
                                  * We don't know what these parameters are=
 for. We default to the legacy
                                  * behavior of taking both the source size=
 and the dict size into account
                                  * when selecting and adjusting parameters.
                                  */
-} ZSTD_cParamMode_e;
+} ZSTD_CParamMode_e;
=20
-typedef size_t (*ZSTD_blockCompressor) (
-        ZSTD_matchState_t* bs, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+typedef size_t (*ZSTD_BlockCompressor_f) (
+        ZSTD_MatchState_t* bs, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
-ZSTD_blockCompressor ZSTD_selectBlockCompressor(ZSTD_strategy strat, ZSTD_=
paramSwitch_e rowMatchfinderMode, ZSTD_dictMode_e dictMode);
+ZSTD_BlockCompressor_f ZSTD_selectBlockCompressor(ZSTD_strategy strat, ZST=
D_ParamSwitch_e rowMatchfinderMode, ZSTD_dictMode_e dictMode);
=20
=20
 MEM_STATIC U32 ZSTD_LLcode(U32 litLength)
@@ -497,12 +616,33 @@ MEM_STATIC int ZSTD_cParam_withinBounds(ZSTD_cParamet=
er cParam, int value)
     return 1;
 }
=20
+/* ZSTD_selectAddr:
+ * @return index >=3D lowLimit ? candidate : backup,
+ * tries to force branchless codegen. */
+MEM_STATIC const BYTE*
+ZSTD_selectAddr(U32 index, U32 lowLimit, const BYTE* candidate, const BYTE=
* backup)
+{
+#if defined(__x86_64__)
+    __asm__ (
+        "cmp %1, %2\n"
+        "cmova %3, %0\n"
+        : "+r"(candidate)
+        : "r"(index), "r"(lowLimit), "r"(backup)
+        );
+    return candidate;
+#else
+    return index >=3D lowLimit ? candidate : backup;
+#endif
+}
+
 /* ZSTD_noCompressBlock() :
  * Writes uncompressed block to dst buffer from given src.
  * Returns the size of the block */
-MEM_STATIC size_t ZSTD_noCompressBlock (void* dst, size_t dstCapacity, con=
st void* src, size_t srcSize, U32 lastBlock)
+MEM_STATIC size_t
+ZSTD_noCompressBlock(void* dst, size_t dstCapacity, const void* src, size_=
t srcSize, U32 lastBlock)
 {
     U32 const cBlockHeader24 =3D lastBlock + (((U32)bt_raw)<<1) + (U32)(sr=
cSize << 3);
+    DEBUGLOG(5, "ZSTD_noCompressBlock (srcSize=3D%zu, dstCapacity=3D%zu)",=
 srcSize, dstCapacity);
     RETURN_ERROR_IF(srcSize + ZSTD_blockHeaderSize > dstCapacity,
                     dstSize_tooSmall, "dst buf too small for uncompressed =
block");
     MEM_writeLE24(dst, cBlockHeader24);
@@ -510,7 +650,8 @@ MEM_STATIC size_t ZSTD_noCompressBlock (void* dst, size=
_t dstCapacity, const voi
     return ZSTD_blockHeaderSize + srcSize;
 }
=20
-MEM_STATIC size_t ZSTD_rleCompressBlock (void* dst, size_t dstCapacity, BY=
TE src, size_t srcSize, U32 lastBlock)
+MEM_STATIC size_t
+ZSTD_rleCompressBlock(void* dst, size_t dstCapacity, BYTE src, size_t srcS=
ize, U32 lastBlock)
 {
     BYTE* const op =3D (BYTE*)dst;
     U32 const cBlockHeader =3D lastBlock + (((U32)bt_rle)<<1) + (U32)(srcS=
ize << 3);
@@ -529,7 +670,7 @@ MEM_STATIC size_t ZSTD_minGain(size_t srcSize, ZSTD_str=
ategy strat)
 {
     U32 const minlog =3D (strat>=3DZSTD_btultra) ? (U32)(strat) - 1 : 6;
     ZSTD_STATIC_ASSERT(ZSTD_btultra =3D=3D 8);
-    assert(ZSTD_cParam_withinBounds(ZSTD_c_strategy, strat));
+    assert(ZSTD_cParam_withinBounds(ZSTD_c_strategy, (int)strat));
     return (srcSize >> minlog) + 2;
 }
=20
@@ -565,29 +706,68 @@ ZSTD_safecopyLiterals(BYTE* op, BYTE const* ip, BYTE =
const* const iend, BYTE con
     while (ip < iend) *op++ =3D *ip++;
 }
=20
-#define ZSTD_REP_MOVE     (ZSTD_REP_NUM-1)
-#define STORE_REPCODE_1 STORE_REPCODE(1)
-#define STORE_REPCODE_2 STORE_REPCODE(2)
-#define STORE_REPCODE_3 STORE_REPCODE(3)
-#define STORE_REPCODE(r) (assert((r)>=3D1), assert((r)<=3D3), (r)-1)
-#define STORE_OFFSET(o)  (assert((o)>0), o + ZSTD_REP_MOVE)
-#define STORED_IS_OFFSET(o)  ((o) > ZSTD_REP_MOVE)
-#define STORED_IS_REPCODE(o) ((o) <=3D ZSTD_REP_MOVE)
-#define STORED_OFFSET(o)  (assert(STORED_IS_OFFSET(o)), (o)-ZSTD_REP_MOVE)
-#define STORED_REPCODE(o) (assert(STORED_IS_REPCODE(o)), (o)+1)  /* return=
s ID 1,2,3 */
-#define STORED_TO_OFFBASE(o) ((o)+1)
-#define OFFBASE_TO_STORED(o) ((o)-1)
+
+#define REPCODE1_TO_OFFBASE REPCODE_TO_OFFBASE(1)
+#define REPCODE2_TO_OFFBASE REPCODE_TO_OFFBASE(2)
+#define REPCODE3_TO_OFFBASE REPCODE_TO_OFFBASE(3)
+#define REPCODE_TO_OFFBASE(r) (assert((r)>=3D1), assert((r)<=3DZSTD_REP_NU=
M), (r)) /* accepts IDs 1,2,3 */
+#define OFFSET_TO_OFFBASE(o)  (assert((o)>0), o + ZSTD_REP_NUM)
+#define OFFBASE_IS_OFFSET(o)  ((o) > ZSTD_REP_NUM)
+#define OFFBASE_IS_REPCODE(o) ( 1 <=3D (o) && (o) <=3D ZSTD_REP_NUM)
+#define OFFBASE_TO_OFFSET(o)  (assert(OFFBASE_IS_OFFSET(o)), (o) - ZSTD_RE=
P_NUM)
+#define OFFBASE_TO_REPCODE(o) (assert(OFFBASE_IS_REPCODE(o)), (o))  /* ret=
urns ID 1,2,3 */
+
+/*! ZSTD_storeSeqOnly() :
+ *  Store a sequence (litlen, litPtr, offBase and matchLength) into SeqSto=
re_t.
+ *  Literals themselves are not copied, but @litPtr is updated.
+ *  @offBase : Users should employ macros REPCODE_TO_OFFBASE() and OFFSET_=
TO_OFFBASE().
+ *  @matchLength : must be >=3D MINMATCH
+*/
+HINT_INLINE UNUSED_ATTR void
+ZSTD_storeSeqOnly(SeqStore_t* seqStorePtr,
+              size_t litLength,
+              U32 offBase,
+              size_t matchLength)
+{
+    assert((size_t)(seqStorePtr->sequences - seqStorePtr->sequencesStart) =
< seqStorePtr->maxNbSeq);
+
+    /* literal Length */
+    assert(litLength <=3D ZSTD_BLOCKSIZE_MAX);
+    if (UNLIKELY(litLength>0xFFFF)) {
+        assert(seqStorePtr->longLengthType =3D=3D ZSTD_llt_none); /* there=
 can only be a single long length */
+        seqStorePtr->longLengthType =3D ZSTD_llt_literalLength;
+        seqStorePtr->longLengthPos =3D (U32)(seqStorePtr->sequences - seqS=
torePtr->sequencesStart);
+    }
+    seqStorePtr->sequences[0].litLength =3D (U16)litLength;
+
+    /* match offset */
+    seqStorePtr->sequences[0].offBase =3D offBase;
+
+    /* match Length */
+    assert(matchLength <=3D ZSTD_BLOCKSIZE_MAX);
+    assert(matchLength >=3D MINMATCH);
+    {   size_t const mlBase =3D matchLength - MINMATCH;
+        if (UNLIKELY(mlBase>0xFFFF)) {
+            assert(seqStorePtr->longLengthType =3D=3D ZSTD_llt_none); /* t=
here can only be a single long length */
+            seqStorePtr->longLengthType =3D ZSTD_llt_matchLength;
+            seqStorePtr->longLengthPos =3D (U32)(seqStorePtr->sequences - =
seqStorePtr->sequencesStart);
+        }
+        seqStorePtr->sequences[0].mlBase =3D (U16)mlBase;
+    }
+
+    seqStorePtr->sequences++;
+}
=20
 /*! ZSTD_storeSeq() :
- *  Store a sequence (litlen, litPtr, offCode and matchLength) into seqSto=
re_t.
- *  @offBase_minus1 : Users should use employ macros STORE_REPCODE_X and S=
TORE_OFFSET().
+ *  Store a sequence (litlen, litPtr, offBase and matchLength) into SeqSto=
re_t.
+ *  @offBase : Users should employ macros REPCODE_TO_OFFBASE() and OFFSET_=
TO_OFFBASE().
  *  @matchLength : must be >=3D MINMATCH
- *  Allowed to overread literals up to litLimit.
+ *  Allowed to over-read literals up to litLimit.
 */
 HINT_INLINE UNUSED_ATTR void
-ZSTD_storeSeq(seqStore_t* seqStorePtr,
+ZSTD_storeSeq(SeqStore_t* seqStorePtr,
               size_t litLength, const BYTE* literals, const BYTE* litLimit,
-              U32 offBase_minus1,
+              U32 offBase,
               size_t matchLength)
 {
     BYTE const* const litLimit_w =3D litLimit - WILDCOPY_OVERLENGTH;
@@ -596,8 +776,8 @@ ZSTD_storeSeq(seqStore_t* seqStorePtr,
     static const BYTE* g_start =3D NULL;
     if (g_start=3D=3DNULL) g_start =3D (const BYTE*)literals;  /* note : i=
ndex only works for compression within a single segment */
     {   U32 const pos =3D (U32)((const BYTE*)literals - g_start);
-        DEBUGLOG(6, "Cpos%7u :%3u literals, match%4u bytes at offCode%7u",
-               pos, (U32)litLength, (U32)matchLength, (U32)offBase_minus1);
+        DEBUGLOG(6, "Cpos%7u :%3u literals, match%4u bytes at offBase%7u",
+               pos, (U32)litLength, (U32)matchLength, (U32)offBase);
     }
 #endif
     assert((size_t)(seqStorePtr->sequences - seqStorePtr->sequencesStart) =
< seqStorePtr->maxNbSeq);
@@ -607,9 +787,9 @@ ZSTD_storeSeq(seqStore_t* seqStorePtr,
     assert(literals + litLength <=3D litLimit);
     if (litEnd <=3D litLimit_w) {
         /* Common case we can use wildcopy.
-	 * First copy 16 bytes, because literals are likely short.
-	 */
-        assert(WILDCOPY_OVERLENGTH >=3D 16);
+         * First copy 16 bytes, because literals are likely short.
+         */
+        ZSTD_STATIC_ASSERT(WILDCOPY_OVERLENGTH >=3D 16);
         ZSTD_copy16(seqStorePtr->lit, literals);
         if (litLength > 16) {
             ZSTD_wildcopy(seqStorePtr->lit+16, literals+16, (ptrdiff_t)lit=
Length-16, ZSTD_no_overlap);
@@ -619,44 +799,22 @@ ZSTD_storeSeq(seqStore_t* seqStorePtr,
     }
     seqStorePtr->lit +=3D litLength;
=20
-    /* literal Length */
-    if (litLength>0xFFFF) {
-        assert(seqStorePtr->longLengthType =3D=3D ZSTD_llt_none); /* there=
 can only be a single long length */
-        seqStorePtr->longLengthType =3D ZSTD_llt_literalLength;
-        seqStorePtr->longLengthPos =3D (U32)(seqStorePtr->sequences - seqS=
torePtr->sequencesStart);
-    }
-    seqStorePtr->sequences[0].litLength =3D (U16)litLength;
-
-    /* match offset */
-    seqStorePtr->sequences[0].offBase =3D STORED_TO_OFFBASE(offBase_minus1=
);
-
-    /* match Length */
-    assert(matchLength >=3D MINMATCH);
-    {   size_t const mlBase =3D matchLength - MINMATCH;
-        if (mlBase>0xFFFF) {
-            assert(seqStorePtr->longLengthType =3D=3D ZSTD_llt_none); /* t=
here can only be a single long length */
-            seqStorePtr->longLengthType =3D ZSTD_llt_matchLength;
-            seqStorePtr->longLengthPos =3D (U32)(seqStorePtr->sequences - =
seqStorePtr->sequencesStart);
-        }
-        seqStorePtr->sequences[0].mlBase =3D (U16)mlBase;
-    }
-
-    seqStorePtr->sequences++;
+    ZSTD_storeSeqOnly(seqStorePtr, litLength, offBase, matchLength);
 }
=20
 /* ZSTD_updateRep() :
  * updates in-place @rep (array of repeat offsets)
- * @offBase_minus1 : sum-type, with same numeric representation as ZSTD_st=
oreSeq()
+ * @offBase : sum-type, using numeric representation of ZSTD_storeSeq()
  */
 MEM_STATIC void
-ZSTD_updateRep(U32 rep[ZSTD_REP_NUM], U32 const offBase_minus1, U32 const =
ll0)
+ZSTD_updateRep(U32 rep[ZSTD_REP_NUM], U32 const offBase, U32 const ll0)
 {
-    if (STORED_IS_OFFSET(offBase_minus1)) {  /* full offset */
+    if (OFFBASE_IS_OFFSET(offBase)) {  /* full offset */
         rep[2] =3D rep[1];
         rep[1] =3D rep[0];
-        rep[0] =3D STORED_OFFSET(offBase_minus1);
+        rep[0] =3D OFFBASE_TO_OFFSET(offBase);
     } else {   /* repcode */
-        U32 const repCode =3D STORED_REPCODE(offBase_minus1) - 1 + ll0;
+        U32 const repCode =3D OFFBASE_TO_REPCODE(offBase) - 1 + ll0;
         if (repCode > 0) {  /* note : if repCode=3D=3D0, no change */
             U32 const currentOffset =3D (repCode=3D=3DZSTD_REP_NUM) ? (rep=
[0] - 1) : rep[repCode];
             rep[2] =3D (repCode >=3D 2) ? rep[1] : rep[2];
@@ -670,14 +828,14 @@ ZSTD_updateRep(U32 rep[ZSTD_REP_NUM], U32 const offBa=
se_minus1, U32 const ll0)
=20
 typedef struct repcodes_s {
     U32 rep[3];
-} repcodes_t;
+} Repcodes_t;
=20
-MEM_STATIC repcodes_t
-ZSTD_newRep(U32 const rep[ZSTD_REP_NUM], U32 const offBase_minus1, U32 con=
st ll0)
+MEM_STATIC Repcodes_t
+ZSTD_newRep(U32 const rep[ZSTD_REP_NUM], U32 const offBase, U32 const ll0)
 {
-    repcodes_t newReps;
+    Repcodes_t newReps;
     ZSTD_memcpy(&newReps, rep, sizeof(newReps));
-    ZSTD_updateRep(newReps.rep, offBase_minus1, ll0);
+    ZSTD_updateRep(newReps.rep, offBase, ll0);
     return newReps;
 }
=20
@@ -685,59 +843,6 @@ ZSTD_newRep(U32 const rep[ZSTD_REP_NUM], U32 const off=
Base_minus1, U32 const ll0
 /*-*************************************
 *  Match length counter
 ***************************************/
-static unsigned ZSTD_NbCommonBytes (size_t val)
-{
-    if (MEM_isLittleEndian()) {
-        if (MEM_64bits()) {
-#       if (__GNUC__ >=3D 4)
-            return (__builtin_ctzll((U64)val) >> 3);
-#       else
-            static const int DeBruijnBytePos[64] =3D { 0, 0, 0, 0, 0, 1, 1=
, 2,
-                                                     0, 3, 1, 3, 1, 4, 2, =
7,
-                                                     0, 2, 3, 6, 1, 5, 3, =
5,
-                                                     1, 3, 4, 4, 2, 5, 6, =
7,
-                                                     7, 0, 1, 2, 3, 3, 4, =
6,
-                                                     2, 6, 5, 5, 3, 4, 5, =
6,
-                                                     7, 1, 2, 4, 6, 4, 4, =
5,
-                                                     7, 2, 6, 5, 7, 6, 7, =
7 };
-            return DeBruijnBytePos[((U64)((val & -(long long)val) * 0x0218=
A392CDABBD3FULL)) >> 58];
-#       endif
-        } else { /* 32 bits */
-#       if (__GNUC__ >=3D 3)
-            return (__builtin_ctz((U32)val) >> 3);
-#       else
-            static const int DeBruijnBytePos[32] =3D { 0, 0, 3, 0, 3, 1, 3=
, 0,
-                                                     3, 2, 2, 1, 3, 2, 0, =
1,
-                                                     3, 3, 1, 2, 2, 2, 2, =
0,
-                                                     3, 1, 2, 0, 1, 0, 1, =
1 };
-            return DeBruijnBytePos[((U32)((val & -(S32)val) * 0x077CB531U)=
) >> 27];
-#       endif
-        }
-    } else {  /* Big Endian CPU */
-        if (MEM_64bits()) {
-#       if (__GNUC__ >=3D 4)
-            return (__builtin_clzll(val) >> 3);
-#       else
-            unsigned r;
-            const unsigned n32 =3D sizeof(size_t)*4;   /* calculate this w=
ay due to compiler complaining in 32-bits mode */
-            if (!(val>>n32)) { r=3D4; } else { r=3D0; val>>=3Dn32; }
-            if (!(val>>16)) { r+=3D2; val>>=3D8; } else { val>>=3D24; }
-            r +=3D (!val);
-            return r;
-#       endif
-        } else { /* 32 bits */
-#       if (__GNUC__ >=3D 3)
-            return (__builtin_clz((U32)val) >> 3);
-#       else
-            unsigned r;
-            if (!(val>>16)) { r=3D2; val>>=3D8; } else { r=3D0; val>>=3D24=
; }
-            r +=3D (!val);
-            return r;
-#       endif
-    }   }
-}
-
-
 MEM_STATIC size_t ZSTD_count(const BYTE* pIn, const BYTE* pMatch, const BY=
TE* const pInLimit)
 {
     const BYTE* const pStart =3D pIn;
@@ -771,8 +876,8 @@ ZSTD_count_2segments(const BYTE* ip, const BYTE* match,
     size_t const matchLength =3D ZSTD_count(ip, match, vEnd);
     if (match + matchLength !=3D mEnd) return matchLength;
     DEBUGLOG(7, "ZSTD_count_2segments: found a 2-parts match (current leng=
th=3D=3D%zu)", matchLength);
-    DEBUGLOG(7, "distance from match beginning to end dictionary =3D %zi",=
 mEnd - match);
-    DEBUGLOG(7, "distance from current pos to end buffer =3D %zi", iEnd - =
ip);
+    DEBUGLOG(7, "distance from match beginning to end dictionary =3D %i", =
(int)(mEnd - match));
+    DEBUGLOG(7, "distance from current pos to end buffer =3D %i", (int)(iE=
nd - ip));
     DEBUGLOG(7, "next byte : ip=3D=3D%02X, istart=3D=3D%02X", ip[matchLeng=
th], *iStart);
     DEBUGLOG(7, "final match length =3D %zu", matchLength + ZSTD_count(ip+=
matchLength, iStart, iEnd));
     return matchLength + ZSTD_count(ip+matchLength, iStart, iEnd);
@@ -783,32 +888,43 @@ ZSTD_count_2segments(const BYTE* ip, const BYTE* matc=
h,
  *  Hashes
  ***************************************/
 static const U32 prime3bytes =3D 506832829U;
-static U32    ZSTD_hash3(U32 u, U32 h) { return ((u << (32-24)) * prime3by=
tes)  >> (32-h) ; }
-MEM_STATIC size_t ZSTD_hash3Ptr(const void* ptr, U32 h) { return ZSTD_hash=
3(MEM_readLE32(ptr), h); } /* only in zstd_opt.h */
+static U32    ZSTD_hash3(U32 u, U32 h, U32 s) { assert(h <=3D 32); return =
(((u << (32-24)) * prime3bytes) ^ s)  >> (32-h) ; }
+MEM_STATIC size_t ZSTD_hash3Ptr(const void* ptr, U32 h) { return ZSTD_hash=
3(MEM_readLE32(ptr), h, 0); } /* only in zstd_opt.h */
+MEM_STATIC size_t ZSTD_hash3PtrS(const void* ptr, U32 h, U32 s) { return Z=
STD_hash3(MEM_readLE32(ptr), h, s); }
=20
 static const U32 prime4bytes =3D 2654435761U;
-static U32    ZSTD_hash4(U32 u, U32 h) { return (u * prime4bytes) >> (32-h=
) ; }
-static size_t ZSTD_hash4Ptr(const void* ptr, U32 h) { return ZSTD_hash4(ME=
M_read32(ptr), h); }
+static U32    ZSTD_hash4(U32 u, U32 h, U32 s) { assert(h <=3D 32); return =
((u * prime4bytes) ^ s) >> (32-h) ; }
+static size_t ZSTD_hash4Ptr(const void* ptr, U32 h) { return ZSTD_hash4(ME=
M_readLE32(ptr), h, 0); }
+static size_t ZSTD_hash4PtrS(const void* ptr, U32 h, U32 s) { return ZSTD_=
hash4(MEM_readLE32(ptr), h, s); }
=20
 static const U64 prime5bytes =3D 889523592379ULL;
-static size_t ZSTD_hash5(U64 u, U32 h) { return (size_t)(((u  << (64-40)) =
* prime5bytes) >> (64-h)) ; }
-static size_t ZSTD_hash5Ptr(const void* p, U32 h) { return ZSTD_hash5(MEM_=
readLE64(p), h); }
+static size_t ZSTD_hash5(U64 u, U32 h, U64 s) { assert(h <=3D 64); return =
(size_t)((((u  << (64-40)) * prime5bytes) ^ s) >> (64-h)) ; }
+static size_t ZSTD_hash5Ptr(const void* p, U32 h) { return ZSTD_hash5(MEM_=
readLE64(p), h, 0); }
+static size_t ZSTD_hash5PtrS(const void* p, U32 h, U64 s) { return ZSTD_ha=
sh5(MEM_readLE64(p), h, s); }
=20
 static const U64 prime6bytes =3D 227718039650203ULL;
-static size_t ZSTD_hash6(U64 u, U32 h) { return (size_t)(((u  << (64-48)) =
* prime6bytes) >> (64-h)) ; }
-static size_t ZSTD_hash6Ptr(const void* p, U32 h) { return ZSTD_hash6(MEM_=
readLE64(p), h); }
+static size_t ZSTD_hash6(U64 u, U32 h, U64 s) { assert(h <=3D 64); return =
(size_t)((((u  << (64-48)) * prime6bytes) ^ s) >> (64-h)) ; }
+static size_t ZSTD_hash6Ptr(const void* p, U32 h) { return ZSTD_hash6(MEM_=
readLE64(p), h, 0); }
+static size_t ZSTD_hash6PtrS(const void* p, U32 h, U64 s) { return ZSTD_ha=
sh6(MEM_readLE64(p), h, s); }
=20
 static const U64 prime7bytes =3D 58295818150454627ULL;
-static size_t ZSTD_hash7(U64 u, U32 h) { return (size_t)(((u  << (64-56)) =
* prime7bytes) >> (64-h)) ; }
-static size_t ZSTD_hash7Ptr(const void* p, U32 h) { return ZSTD_hash7(MEM_=
readLE64(p), h); }
+static size_t ZSTD_hash7(U64 u, U32 h, U64 s) { assert(h <=3D 64); return =
(size_t)((((u  << (64-56)) * prime7bytes) ^ s) >> (64-h)) ; }
+static size_t ZSTD_hash7Ptr(const void* p, U32 h) { return ZSTD_hash7(MEM_=
readLE64(p), h, 0); }
+static size_t ZSTD_hash7PtrS(const void* p, U32 h, U64 s) { return ZSTD_ha=
sh7(MEM_readLE64(p), h, s); }
=20
 static const U64 prime8bytes =3D 0xCF1BBCDCB7A56463ULL;
-static size_t ZSTD_hash8(U64 u, U32 h) { return (size_t)(((u) * prime8byte=
s) >> (64-h)) ; }
-static size_t ZSTD_hash8Ptr(const void* p, U32 h) { return ZSTD_hash8(MEM_=
readLE64(p), h); }
+static size_t ZSTD_hash8(U64 u, U32 h, U64 s) { assert(h <=3D 64); return =
(size_t)((((u) * prime8bytes)  ^ s) >> (64-h)) ; }
+static size_t ZSTD_hash8Ptr(const void* p, U32 h) { return ZSTD_hash8(MEM_=
readLE64(p), h, 0); }
+static size_t ZSTD_hash8PtrS(const void* p, U32 h, U64 s) { return ZSTD_ha=
sh8(MEM_readLE64(p), h, s); }
+
=20
 MEM_STATIC FORCE_INLINE_ATTR
 size_t ZSTD_hashPtr(const void* p, U32 hBits, U32 mls)
 {
+    /* Although some of these hashes do support hBits up to 64, some do no=
t.
+     * To be on the safe side, always avoid hBits > 32. */
+    assert(hBits <=3D 32);
+
     switch(mls)
     {
     default:
@@ -820,6 +936,24 @@ size_t ZSTD_hashPtr(const void* p, U32 hBits, U32 mls)
     }
 }
=20
+MEM_STATIC FORCE_INLINE_ATTR
+size_t ZSTD_hashPtrSalted(const void* p, U32 hBits, U32 mls, const U64 has=
hSalt) {
+    /* Although some of these hashes do support hBits up to 64, some do no=
t.
+     * To be on the safe side, always avoid hBits > 32. */
+    assert(hBits <=3D 32);
+
+    switch(mls)
+    {
+        default:
+        case 4: return ZSTD_hash4PtrS(p, hBits, (U32)hashSalt);
+        case 5: return ZSTD_hash5PtrS(p, hBits, hashSalt);
+        case 6: return ZSTD_hash6PtrS(p, hBits, hashSalt);
+        case 7: return ZSTD_hash7PtrS(p, hBits, hashSalt);
+        case 8: return ZSTD_hash8PtrS(p, hBits, hashSalt);
+    }
+}
+
+
 /* ZSTD_ipow() :
  * Return base^exponent.
  */
@@ -881,11 +1015,12 @@ MEM_STATIC U64 ZSTD_rollingHash_rotate(U64 hash, BYT=
E toRemove, BYTE toAdd, U64
 /*-*************************************
 *  Round buffer management
 ***************************************/
-#if (ZSTD_WINDOWLOG_MAX_64 > 31)
-# error "ZSTD_WINDOWLOG_MAX is too large : would overflow ZSTD_CURRENT_MAX"
-#endif
-/* Max current allowed */
-#define ZSTD_CURRENT_MAX ((3U << 29) + (1U << ZSTD_WINDOWLOG_MAX))
+/* Max @current value allowed:
+ * In 32-bit mode: we want to avoid crossing the 2 GB limit,
+ *                 reducing risks of side effects in case of signed operat=
ions on indexes.
+ * In 64-bit mode: we want to ensure that adding the maximum job size (512=
 MB)
+ *                 doesn't overflow U32 index capacity (4 GB) */
+#define ZSTD_CURRENT_MAX (MEM_64bits() ? 3500U MB : 2000U MB)
 /* Maximum chunk size before overflow correction needs to be called again =
*/
 #define ZSTD_CHUNKSIZE_MAX                                                =
     \
     ( ((U32)-1)                  /* Maximum ending current index */       =
     \
@@ -925,7 +1060,7 @@ MEM_STATIC U32 ZSTD_window_hasExtDict(ZSTD_window_t co=
nst window)
  * Inspects the provided matchState and figures out what dictMode should be
  * passed to the compressor.
  */
-MEM_STATIC ZSTD_dictMode_e ZSTD_matchState_dictMode(const ZSTD_matchState_=
t *ms)
+MEM_STATIC ZSTD_dictMode_e ZSTD_matchState_dictMode(const ZSTD_MatchState_=
t *ms)
 {
     return ZSTD_window_hasExtDict(ms->window) ?
         ZSTD_extDict :
@@ -1011,7 +1146,9 @@ MEM_STATIC U32 ZSTD_window_needOverflowCorrection(ZST=
D_window_t const window,
  * The least significant cycleLog bits of the indices must remain the same,
  * which may be 0. Every index up to maxDist in the past must be valid.
  */
-MEM_STATIC U32 ZSTD_window_correctOverflow(ZSTD_window_t* window, U32 cycl=
eLog,
+MEM_STATIC
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
+U32 ZSTD_window_correctOverflow(ZSTD_window_t* window, U32 cycleLog,
                                            U32 maxDist, void const* src)
 {
     /* preemptive overflow correction:
@@ -1112,7 +1249,7 @@ ZSTD_window_enforceMaxDist(ZSTD_window_t* window,
                      const void* blockEnd,
                            U32   maxDist,
                            U32*  loadedDictEndPtr,
-                     const ZSTD_matchState_t** dictMatchStatePtr)
+                     const ZSTD_MatchState_t** dictMatchStatePtr)
 {
     U32 const blockEndIdx =3D (U32)((BYTE const*)blockEnd - window->base);
     U32 const loadedDictEnd =3D (loadedDictEndPtr !=3D NULL) ? *loadedDict=
EndPtr : 0;
@@ -1157,7 +1294,7 @@ ZSTD_checkDictValidity(const ZSTD_window_t* window,
                        const void* blockEnd,
                              U32   maxDist,
                              U32*  loadedDictEndPtr,
-                       const ZSTD_matchState_t** dictMatchStatePtr)
+                       const ZSTD_MatchState_t** dictMatchStatePtr)
 {
     assert(loadedDictEndPtr !=3D NULL);
     assert(dictMatchStatePtr !=3D NULL);
@@ -1167,10 +1304,15 @@ ZSTD_checkDictValidity(const ZSTD_window_t* window,
                     (unsigned)blockEndIdx, (unsigned)maxDist, (unsigned)lo=
adedDictEnd);
         assert(blockEndIdx >=3D loadedDictEnd);
=20
-        if (blockEndIdx > loadedDictEnd + maxDist) {
+        if (blockEndIdx > loadedDictEnd + maxDist || loadedDictEnd !=3D wi=
ndow->dictLimit) {
             /* On reaching window size, dictionaries are invalidated.
              * For simplification, if window size is reached anywhere with=
in next block,
              * the dictionary is invalidated for the full block.
+             *
+             * We also have to invalidate the dictionary if ZSTD_window_up=
date() has detected
+             * non-contiguous segments, which means that loadedDictEnd !=
=3D window->dictLimit.
+             * loadedDictEnd may be 0, if forceWindow is true, but in that=
 case we never use
+             * dictMatchState, so setting it to NULL is not a problem.
              */
             DEBUGLOG(6, "invalidating dictionary for current block (distan=
ce > windowSize)");
             *loadedDictEndPtr =3D 0;
@@ -1199,9 +1341,11 @@ MEM_STATIC void ZSTD_window_init(ZSTD_window_t* wind=
ow) {
  * forget about the extDict. Handles overlap of the prefix and extDict.
  * Returns non-zero if the segment is contiguous.
  */
-MEM_STATIC U32 ZSTD_window_update(ZSTD_window_t* window,
-                                  void const* src, size_t srcSize,
-                                  int forceNonContiguous)
+MEM_STATIC
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
+U32 ZSTD_window_update(ZSTD_window_t* window,
+                 const void* src, size_t srcSize,
+                       int forceNonContiguous)
 {
     BYTE const* const ip =3D (BYTE const*)src;
     U32 contiguous =3D 1;
@@ -1228,8 +1372,9 @@ MEM_STATIC U32 ZSTD_window_update(ZSTD_window_t* wind=
ow,
     /* if input and dictionary overlap : reduce dictionary (area presumed =
modified by input) */
     if ( (ip+srcSize > window->dictBase + window->lowLimit)
        & (ip < window->dictBase + window->dictLimit)) {
-        ptrdiff_t const highInputIdx =3D (ip + srcSize) - window->dictBase;
-        U32 const lowLimitMax =3D (highInputIdx > (ptrdiff_t)window->dictL=
imit) ? window->dictLimit : (U32)highInputIdx;
+        size_t const highInputIdx =3D (size_t)((ip + srcSize) - window->di=
ctBase);
+        U32 const lowLimitMax =3D (highInputIdx > (size_t)window->dictLimi=
t) ? window->dictLimit : (U32)highInputIdx;
+        assert(highInputIdx < UINT_MAX);
         window->lowLimit =3D lowLimitMax;
         DEBUGLOG(5, "Overlapping extDict and input : new lowLimit =3D %u",=
 window->lowLimit);
     }
@@ -1239,7 +1384,7 @@ MEM_STATIC U32 ZSTD_window_update(ZSTD_window_t* wind=
ow,
 /*
  * Returns the lowest allowed match index. It may either be in the ext-dic=
t or the prefix.
  */
-MEM_STATIC U32 ZSTD_getLowestMatchIndex(const ZSTD_matchState_t* ms, U32 c=
urr, unsigned windowLog)
+MEM_STATIC U32 ZSTD_getLowestMatchIndex(const ZSTD_MatchState_t* ms, U32 c=
urr, unsigned windowLog)
 {
     U32 const maxDistance =3D 1U << windowLog;
     U32 const lowestValid =3D ms->window.lowLimit;
@@ -1256,7 +1401,7 @@ MEM_STATIC U32 ZSTD_getLowestMatchIndex(const ZSTD_ma=
tchState_t* ms, U32 curr, u
 /*
  * Returns the lowest allowed match index in the prefix.
  */
-MEM_STATIC U32 ZSTD_getLowestPrefixIndex(const ZSTD_matchState_t* ms, U32 =
curr, unsigned windowLog)
+MEM_STATIC U32 ZSTD_getLowestPrefixIndex(const ZSTD_MatchState_t* ms, U32 =
curr, unsigned windowLog)
 {
     U32    const maxDistance =3D 1U << windowLog;
     U32    const lowestValid =3D ms->window.dictLimit;
@@ -1269,6 +1414,13 @@ MEM_STATIC U32 ZSTD_getLowestPrefixIndex(const ZSTD_=
matchState_t* ms, U32 curr,
     return matchLowest;
 }
=20
+/* index_safety_check:
+ * intentional underflow : ensure repIndex isn't overlapping dict + prefix
+ * @return 1 if values are not overlapping,
+ * 0 otherwise */
+MEM_STATIC int ZSTD_index_overlap_check(const U32 prefixLowestIndex, const=
 U32 repIndex) {
+    return ((U32)((prefixLowestIndex-1)  - repIndex) >=3D 3);
+}
=20
=20
 /* debug functions */
@@ -1302,7 +1454,42 @@ MEM_STATIC void ZSTD_debugTable(const U32* table, U3=
2 max)
=20
 #endif
=20
+/* Short Cache */
+
+/* Normally, zstd matchfinders follow this flow:
+ *     1. Compute hash at ip
+ *     2. Load index from hashTable[hash]
+ *     3. Check if *ip =3D=3D *(base + index)
+ * In dictionary compression, loading *(base + index) is often an L2 or ev=
en L3 miss.
+ *
+ * Short cache is an optimization which allows us to avoid step 3 most of =
the time
+ * when the data doesn't actually match. With short cache, the flow become=
s:
+ *     1. Compute (hash, currentTag) at ip. currentTag is an 8-bit indepen=
dent hash at ip.
+ *     2. Load (index, matchTag) from hashTable[hash]. See ZSTD_writeTagge=
dIndex to understand how this works.
+ *     3. Only if currentTag =3D=3D matchTag, check *ip =3D=3D *(base + in=
dex). Otherwise, continue.
+ *
+ * Currently, short cache is only implemented in CDict hashtables. Thus, i=
ts use is limited to
+ * dictMatchState matchfinders.
+ */
+#define ZSTD_SHORT_CACHE_TAG_BITS 8
+#define ZSTD_SHORT_CACHE_TAG_MASK ((1u << ZSTD_SHORT_CACHE_TAG_BITS) - 1)
+
+/* Helper function for ZSTD_fillHashTable and ZSTD_fillDoubleHashTable.
+ * Unpacks hashAndTag into (hash, tag), then packs (index, tag) into hashT=
able[hash]. */
+MEM_STATIC void ZSTD_writeTaggedIndex(U32* const hashTable, size_t hashAnd=
Tag, U32 index) {
+    size_t const hash =3D hashAndTag >> ZSTD_SHORT_CACHE_TAG_BITS;
+    U32 const tag =3D (U32)(hashAndTag & ZSTD_SHORT_CACHE_TAG_MASK);
+    assert(index >> (32 - ZSTD_SHORT_CACHE_TAG_BITS) =3D=3D 0);
+    hashTable[hash] =3D (index << ZSTD_SHORT_CACHE_TAG_BITS) | tag;
+}
=20
+/* Helper function for short cache matchfinders.
+ * Unpacks tag1 and tag2 from lower bits of packedTag1 and packedTag2, the=
n checks if the tags match. */
+MEM_STATIC int ZSTD_comparePackedTags(size_t packedTag1, size_t packedTag2=
) {
+    U32 const tag1 =3D packedTag1 & ZSTD_SHORT_CACHE_TAG_MASK;
+    U32 const tag2 =3D packedTag2 & ZSTD_SHORT_CACHE_TAG_MASK;
+    return tag1 =3D=3D tag2;
+}
=20
 /* =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
  * Shared internal declarations
@@ -1319,6 +1506,25 @@ size_t ZSTD_loadCEntropy(ZSTD_compressedBlockState_t=
* bs, void* workspace,
=20
 void ZSTD_reset_compressedBlockState(ZSTD_compressedBlockState_t* bs);
=20
+typedef struct {
+    U32 idx;            /* Index in array of ZSTD_Sequence */
+    U32 posInSequence;  /* Position within sequence at idx */
+    size_t posInSrc;    /* Number of bytes given by sequences provided so =
far */
+} ZSTD_SequencePosition;
+
+/* for benchmark */
+size_t ZSTD_convertBlockSequences(ZSTD_CCtx* cctx,
+                        const ZSTD_Sequence* const inSeqs, size_t nbSequen=
ces,
+                        int const repcodeResolution);
+
+typedef struct {
+    size_t nbSequences;
+    size_t blockSize;
+    size_t litSize;
+} BlockSummary;
+
+BlockSummary ZSTD_get1BlockSummary(const ZSTD_Sequence* seqs, size_t nbSeq=
s);
+
 /* =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
  * Private declarations
  * These prototypes shall only be called from within lib/compress
@@ -1330,7 +1536,7 @@ void ZSTD_reset_compressedBlockState(ZSTD_compressedB=
lockState_t* bs);
  * Note: srcSizeHint =3D=3D 0 means 0!
  */
 ZSTD_compressionParameters ZSTD_getCParamsFromCCtxParams(
-        const ZSTD_CCtx_params* CCtxParams, U64 srcSizeHint, size_t dictSi=
ze, ZSTD_cParamMode_e mode);
+        const ZSTD_CCtx_params* CCtxParams, U64 srcSizeHint, size_t dictSi=
ze, ZSTD_CParamMode_e mode);
=20
 /*! ZSTD_initCStream_internal() :
  *  Private use only. Init streaming operation.
@@ -1342,7 +1548,7 @@ size_t ZSTD_initCStream_internal(ZSTD_CStream* zcs,
                      const ZSTD_CDict* cdict,
                      const ZSTD_CCtx_params* params, unsigned long long pl=
edgedSrcSize);
=20
-void ZSTD_resetSeqStore(seqStore_t* ssPtr);
+void ZSTD_resetSeqStore(SeqStore_t* ssPtr);
=20
 /*! ZSTD_getCParamsFromCDict() :
  *  as the name implies */
@@ -1381,11 +1587,10 @@ size_t ZSTD_writeLastEmptyBlock(void* dst, size_t d=
stCapacity);
  * This cannot be used when long range matching is enabled.
  * Zstd will use these sequences, and pass the literals to a secondary blo=
ck
  * compressor.
- * @return : An error code on failure.
  * NOTE: seqs are not verified! Invalid sequences can cause out-of-bounds =
memory
  * access and data corruption.
  */
-size_t ZSTD_referenceExternalSequences(ZSTD_CCtx* cctx, rawSeq* seq, size_=
t nbSeq);
+void ZSTD_referenceExternalSequences(ZSTD_CCtx* cctx, rawSeq* seq, size_t =
nbSeq);
=20
 /* ZSTD_cycleLog() :
  *  condition for correct operation : hashLog > 1 */
@@ -1396,4 +1601,28 @@ U32 ZSTD_cycleLog(U32 hashLog, ZSTD_strategy strat);
  */
 void ZSTD_CCtx_trace(ZSTD_CCtx* cctx, size_t extraCSize);
=20
+/* Returns 1 if an external sequence producer is registered, otherwise ret=
urns 0. */
+MEM_STATIC int ZSTD_hasExtSeqProd(const ZSTD_CCtx_params* params) {
+    return params->extSeqProdFunc !=3D NULL;
+}
+
+/* =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+ * Deprecated definitions that are still used internally to avoid
+ * deprecation warnings. These functions are exactly equivalent to
+ * their public variants, but avoid the deprecation warnings.
+ * =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D */
+
+size_t ZSTD_compressBegin_usingCDict_deprecated(ZSTD_CCtx* cctx, const ZST=
D_CDict* cdict);
+
+size_t ZSTD_compressContinue_public(ZSTD_CCtx* cctx,
+                                    void* dst, size_t dstCapacity,
+                              const void* src, size_t srcSize);
+
+size_t ZSTD_compressEnd_public(ZSTD_CCtx* cctx,
+                               void* dst, size_t dstCapacity,
+                         const void* src, size_t srcSize);
+
+size_t ZSTD_compressBlock_deprecated(ZSTD_CCtx* cctx, void* dst, size_t ds=
tCapacity, const void* src, size_t srcSize);
+
+
 #endif /* ZSTD_COMPRESS_H */
diff --git a/lib/zstd/compress/zstd_compress_literals.c b/lib/zstd/compress=
/zstd_compress_literals.c
index 52b0a8059aba..ec39b4299b6f 100644
--- a/lib/zstd/compress/zstd_compress_literals.c
+++ b/lib/zstd/compress/zstd_compress_literals.c
@@ -1,5 +1,6 @@
+// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -13,11 +14,36 @@
  ***************************************/
 #include "zstd_compress_literals.h"
=20
+
+/* **************************************************************
+*  Debug Traces
+****************************************************************/
+#if DEBUGLEVEL >=3D 2
+
+static size_t showHexa(const void* src, size_t srcSize)
+{
+    const BYTE* const ip =3D (const BYTE*)src;
+    size_t u;
+    for (u=3D0; u<srcSize; u++) {
+        RAWLOG(5, " %02X", ip[u]); (void)ip;
+    }
+    RAWLOG(5, " \n");
+    return srcSize;
+}
+
+#endif
+
+
+/* **************************************************************
+*  Literals compression - special cases
+****************************************************************/
 size_t ZSTD_noCompressLiterals (void* dst, size_t dstCapacity, const void*=
 src, size_t srcSize)
 {
     BYTE* const ostart =3D (BYTE*)dst;
     U32   const flSize =3D 1 + (srcSize>31) + (srcSize>4095);
=20
+    DEBUGLOG(5, "ZSTD_noCompressLiterals: srcSize=3D%zu, dstCapacity=3D%zu=
", srcSize, dstCapacity);
+
     RETURN_ERROR_IF(srcSize + flSize > dstCapacity, dstSize_tooSmall, "");
=20
     switch(flSize)
@@ -36,16 +62,30 @@ size_t ZSTD_noCompressLiterals (void* dst, size_t dstCa=
pacity, const void* src,
     }
=20
     ZSTD_memcpy(ostart + flSize, src, srcSize);
-    DEBUGLOG(5, "Raw literals: %u -> %u", (U32)srcSize, (U32)(srcSize + fl=
Size));
+    DEBUGLOG(5, "Raw (uncompressed) literals: %u -> %u", (U32)srcSize, (U3=
2)(srcSize + flSize));
     return srcSize + flSize;
 }
=20
+static int allBytesIdentical(const void* src, size_t srcSize)
+{
+    assert(srcSize >=3D 1);
+    assert(src !=3D NULL);
+    {   const BYTE b =3D ((const BYTE*)src)[0];
+        size_t p;
+        for (p=3D1; p<srcSize; p++) {
+            if (((const BYTE*)src)[p] !=3D b) return 0;
+        }
+        return 1;
+    }
+}
+
 size_t ZSTD_compressRleLiteralsBlock (void* dst, size_t dstCapacity, const=
 void* src, size_t srcSize)
 {
     BYTE* const ostart =3D (BYTE*)dst;
     U32   const flSize =3D 1 + (srcSize>31) + (srcSize>4095);
=20
-    (void)dstCapacity;  /* dstCapacity already guaranteed to be >=3D4, hen=
ce large enough */
+    assert(dstCapacity >=3D 4); (void)dstCapacity;
+    assert(allBytesIdentical(src, srcSize));
=20
     switch(flSize)
     {
@@ -63,28 +103,51 @@ size_t ZSTD_compressRleLiteralsBlock (void* dst, size_=
t dstCapacity, const void*
     }
=20
     ostart[flSize] =3D *(const BYTE*)src;
-    DEBUGLOG(5, "RLE literals: %u -> %u", (U32)srcSize, (U32)flSize + 1);
+    DEBUGLOG(5, "RLE : Repeated Literal (%02X: %u times) -> %u bytes encod=
ed", ((const BYTE*)src)[0], (U32)srcSize, (U32)flSize + 1);
     return flSize+1;
 }
=20
-size_t ZSTD_compressLiterals (ZSTD_hufCTables_t const* prevHuf,
-                              ZSTD_hufCTables_t* nextHuf,
-                              ZSTD_strategy strategy, int disableLiteralCo=
mpression,
-                              void* dst, size_t dstCapacity,
-                        const void* src, size_t srcSize,
-                              void* entropyWorkspace, size_t entropyWorksp=
aceSize,
-                        const int bmi2,
-                        unsigned suspectUncompressible)
+/* ZSTD_minLiteralsToCompress() :
+ * returns minimal amount of literals
+ * for literal compression to even be attempted.
+ * Minimum is made tighter as compression strategy increases.
+ */
+static size_t
+ZSTD_minLiteralsToCompress(ZSTD_strategy strategy, HUF_repeat huf_repeat)
+{
+    assert((int)strategy >=3D 0);
+    assert((int)strategy <=3D 9);
+    /* btultra2 : min 8 bytes;
+     * then 2x larger for each successive compression strategy
+     * max threshold 64 bytes */
+    {   int const shift =3D MIN(9-(int)strategy, 3);
+        size_t const mintc =3D (huf_repeat =3D=3D HUF_repeat_valid) ? 6 : =
(size_t)8 << shift;
+        DEBUGLOG(7, "minLiteralsToCompress =3D %zu", mintc);
+        return mintc;
+    }
+}
+
+size_t ZSTD_compressLiterals (
+                  void* dst, size_t dstCapacity,
+            const void* src, size_t srcSize,
+                  void* entropyWorkspace, size_t entropyWorkspaceSize,
+            const ZSTD_hufCTables_t* prevHuf,
+                  ZSTD_hufCTables_t* nextHuf,
+                  ZSTD_strategy strategy,
+                  int disableLiteralCompression,
+                  int suspectUncompressible,
+                  int bmi2)
 {
-    size_t const minGain =3D ZSTD_minGain(srcSize, strategy);
     size_t const lhSize =3D 3 + (srcSize >=3D 1 KB) + (srcSize >=3D 16 KB);
     BYTE*  const ostart =3D (BYTE*)dst;
     U32 singleStream =3D srcSize < 256;
-    symbolEncodingType_e hType =3D set_compressed;
+    SymbolEncodingType_e hType =3D set_compressed;
     size_t cLitSize;
=20
-    DEBUGLOG(5,"ZSTD_compressLiterals (disableLiteralCompression=3D%i srcS=
ize=3D%u)",
-                disableLiteralCompression, (U32)srcSize);
+    DEBUGLOG(5,"ZSTD_compressLiterals (disableLiteralCompression=3D%i, src=
Size=3D%u, dstCapacity=3D%zu)",
+                disableLiteralCompression, (U32)srcSize, dstCapacity);
+
+    DEBUGLOG(6, "Completed literals listing (%zu bytes)", showHexa(src, sr=
cSize));
=20
     /* Prepare nextEntropy assuming reusing the existing table */
     ZSTD_memcpy(nextHuf, prevHuf, sizeof(*prevHuf));
@@ -92,40 +155,51 @@ size_t ZSTD_compressLiterals (ZSTD_hufCTables_t const*=
 prevHuf,
     if (disableLiteralCompression)
         return ZSTD_noCompressLiterals(dst, dstCapacity, src, srcSize);
=20
-    /* small ? don't even attempt compression (speed opt) */
-#   define COMPRESS_LITERALS_SIZE_MIN 63
-    {   size_t const minLitSize =3D (prevHuf->repeatMode =3D=3D HUF_repeat=
_valid) ? 6 : COMPRESS_LITERALS_SIZE_MIN;
-        if (srcSize <=3D minLitSize) return ZSTD_noCompressLiterals(dst, d=
stCapacity, src, srcSize);
-    }
+    /* if too small, don't even attempt compression (speed opt) */
+    if (srcSize < ZSTD_minLiteralsToCompress(strategy, prevHuf->repeatMode=
))
+        return ZSTD_noCompressLiterals(dst, dstCapacity, src, srcSize);
=20
     RETURN_ERROR_IF(dstCapacity < lhSize+1, dstSize_tooSmall, "not enough =
space for compression");
     {   HUF_repeat repeat =3D prevHuf->repeatMode;
-        int const preferRepeat =3D strategy < ZSTD_lazy ? srcSize <=3D 102=
4 : 0;
+        int const flags =3D 0
+            | (bmi2 ? HUF_flags_bmi2 : 0)
+            | (strategy < ZSTD_lazy && srcSize <=3D 1024 ? HUF_flags_prefe=
rRepeat : 0)
+            | (strategy >=3D HUF_OPTIMAL_DEPTH_THRESHOLD ? HUF_flags_optim=
alDepth : 0)
+            | (suspectUncompressible ? HUF_flags_suspectUncompressible : 0=
);
+
+        typedef size_t (*huf_compress_f)(void*, size_t, const void*, size_=
t, unsigned, unsigned, void*, size_t, HUF_CElt*, HUF_repeat*, int);
+        huf_compress_f huf_compress;
         if (repeat =3D=3D HUF_repeat_valid && lhSize =3D=3D 3) singleStrea=
m =3D 1;
-        cLitSize =3D singleStream ?
-            HUF_compress1X_repeat(
-                ostart+lhSize, dstCapacity-lhSize, src, srcSize,
-                HUF_SYMBOLVALUE_MAX, HUF_TABLELOG_DEFAULT, entropyWorkspac=
e, entropyWorkspaceSize,
-                (HUF_CElt*)nextHuf->CTable, &repeat, preferRepeat, bmi2, s=
uspectUncompressible) :
-            HUF_compress4X_repeat(
-                ostart+lhSize, dstCapacity-lhSize, src, srcSize,
-                HUF_SYMBOLVALUE_MAX, HUF_TABLELOG_DEFAULT, entropyWorkspac=
e, entropyWorkspaceSize,
-                (HUF_CElt*)nextHuf->CTable, &repeat, preferRepeat, bmi2, s=
uspectUncompressible);
+        huf_compress =3D singleStream ? HUF_compress1X_repeat : HUF_compre=
ss4X_repeat;
+        cLitSize =3D huf_compress(ostart+lhSize, dstCapacity-lhSize,
+                                src, srcSize,
+                                HUF_SYMBOLVALUE_MAX, LitHufLog,
+                                entropyWorkspace, entropyWorkspaceSize,
+                                (HUF_CElt*)nextHuf->CTable,
+                                &repeat, flags);
+        DEBUGLOG(5, "%zu literals compressed into %zu bytes (before header=
)", srcSize, cLitSize);
         if (repeat !=3D HUF_repeat_none) {
             /* reused the existing table */
-            DEBUGLOG(5, "Reusing previous huffman table");
+            DEBUGLOG(5, "reusing statistics from previous huffman block");
             hType =3D set_repeat;
         }
     }
=20
-    if ((cLitSize=3D=3D0) || (cLitSize >=3D srcSize - minGain) || ERR_isEr=
ror(cLitSize)) {
-        ZSTD_memcpy(nextHuf, prevHuf, sizeof(*prevHuf));
-        return ZSTD_noCompressLiterals(dst, dstCapacity, src, srcSize);
-    }
+    {   size_t const minGain =3D ZSTD_minGain(srcSize, strategy);
+        if ((cLitSize=3D=3D0) || (cLitSize >=3D srcSize - minGain) || ERR_=
isError(cLitSize)) {
+            ZSTD_memcpy(nextHuf, prevHuf, sizeof(*prevHuf));
+            return ZSTD_noCompressLiterals(dst, dstCapacity, src, srcSize);
+    }   }
     if (cLitSize=3D=3D1) {
-        ZSTD_memcpy(nextHuf, prevHuf, sizeof(*prevHuf));
-        return ZSTD_compressRleLiteralsBlock(dst, dstCapacity, src, srcSiz=
e);
-    }
+        /* A return value of 1 signals that the alphabet consists of a sin=
gle symbol.
+         * However, in some rare circumstances, it could be the compressed=
 size (a single byte).
+         * For that outcome to have a chance to happen, it's necessary tha=
t `srcSize < 8`.
+         * (it's also necessary to not generate statistics).
+         * Therefore, in such a case, actively check that all bytes are id=
entical. */
+        if ((srcSize >=3D 8) || allBytesIdentical(src, srcSize)) {
+            ZSTD_memcpy(nextHuf, prevHuf, sizeof(*prevHuf));
+            return ZSTD_compressRleLiteralsBlock(dst, dstCapacity, src, sr=
cSize);
+    }   }
=20
     if (hType =3D=3D set_compressed) {
         /* using a newly constructed table */
@@ -136,16 +210,19 @@ size_t ZSTD_compressLiterals (ZSTD_hufCTables_t const=
* prevHuf,
     switch(lhSize)
     {
     case 3: /* 2 - 2 - 10 - 10 */
-        {   U32 const lhc =3D hType + ((!singleStream) << 2) + ((U32)srcSi=
ze<<4) + ((U32)cLitSize<<14);
+        if (!singleStream) assert(srcSize >=3D MIN_LITERALS_FOR_4_STREAMS);
+        {   U32 const lhc =3D hType + ((U32)(!singleStream) << 2) + ((U32)=
srcSize<<4) + ((U32)cLitSize<<14);
             MEM_writeLE24(ostart, lhc);
             break;
         }
     case 4: /* 2 - 2 - 14 - 14 */
+        assert(srcSize >=3D MIN_LITERALS_FOR_4_STREAMS);
         {   U32 const lhc =3D hType + (2 << 2) + ((U32)srcSize<<4) + ((U32=
)cLitSize<<18);
             MEM_writeLE32(ostart, lhc);
             break;
         }
     case 5: /* 2 - 2 - 18 - 18 */
+        assert(srcSize >=3D MIN_LITERALS_FOR_4_STREAMS);
         {   U32 const lhc =3D hType + (3 << 2) + ((U32)srcSize<<4) + ((U32=
)cLitSize<<22);
             MEM_writeLE32(ostart, lhc);
             ostart[4] =3D (BYTE)(cLitSize >> 10);
diff --git a/lib/zstd/compress/zstd_compress_literals.h b/lib/zstd/compress=
/zstd_compress_literals.h
index 9775fb97cb70..a2a85d6b69e5 100644
--- a/lib/zstd/compress/zstd_compress_literals.h
+++ b/lib/zstd/compress/zstd_compress_literals.h
@@ -1,5 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -16,16 +17,24 @@
=20
 size_t ZSTD_noCompressLiterals (void* dst, size_t dstCapacity, const void*=
 src, size_t srcSize);
=20
+/* ZSTD_compressRleLiteralsBlock() :
+ * Conditions :
+ * - All bytes in @src are identical
+ * - dstCapacity >=3D 4 */
 size_t ZSTD_compressRleLiteralsBlock (void* dst, size_t dstCapacity, const=
 void* src, size_t srcSize);
=20
-/* If suspectUncompressible then some sampling checks will be run to poten=
tially skip huffman coding */
-size_t ZSTD_compressLiterals (ZSTD_hufCTables_t const* prevHuf,
-                              ZSTD_hufCTables_t* nextHuf,
-                              ZSTD_strategy strategy, int disableLiteralCo=
mpression,
-                              void* dst, size_t dstCapacity,
+/* ZSTD_compressLiterals():
+ * @entropyWorkspace: must be aligned on 4-bytes boundaries
+ * @entropyWorkspaceSize : must be >=3D HUF_WORKSPACE_SIZE
+ * @suspectUncompressible: sampling checks, to potentially skip huffman co=
ding
+ */
+size_t ZSTD_compressLiterals (void* dst, size_t dstCapacity,
                         const void* src, size_t srcSize,
                               void* entropyWorkspace, size_t entropyWorksp=
aceSize,
-                        const int bmi2,
-                        unsigned suspectUncompressible);
+                        const ZSTD_hufCTables_t* prevHuf,
+                              ZSTD_hufCTables_t* nextHuf,
+                              ZSTD_strategy strategy, int disableLiteralCo=
mpression,
+                              int suspectUncompressible,
+                              int bmi2);
=20
 #endif /* ZSTD_COMPRESS_LITERALS_H */
diff --git a/lib/zstd/compress/zstd_compress_sequences.c b/lib/zstd/compres=
s/zstd_compress_sequences.c
index 21ddc1b37acf..256980c9d85a 100644
--- a/lib/zstd/compress/zstd_compress_sequences.c
+++ b/lib/zstd/compress/zstd_compress_sequences.c
@@ -1,5 +1,6 @@
+// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -58,7 +59,7 @@ static unsigned ZSTD_useLowProbCount(size_t const nbSeq)
 {
     /* Heuristic: This should cover most blocks <=3D 16K and
      * start to fade out after 16K to about 32K depending on
-     * comprssibility.
+     * compressibility.
      */
     return nbSeq >=3D 2048;
 }
@@ -153,20 +154,20 @@ size_t ZSTD_crossEntropyCost(short const* norm, unsig=
ned accuracyLog,
     return cost >> 8;
 }
=20
-symbolEncodingType_e
+SymbolEncodingType_e
 ZSTD_selectEncodingType(
         FSE_repeat* repeatMode, unsigned const* count, unsigned const max,
         size_t const mostFrequent, size_t nbSeq, unsigned const FSELog,
         FSE_CTable const* prevCTable,
         short const* defaultNorm, U32 defaultNormLog,
-        ZSTD_defaultPolicy_e const isDefaultAllowed,
+        ZSTD_DefaultPolicy_e const isDefaultAllowed,
         ZSTD_strategy const strategy)
 {
     ZSTD_STATIC_ASSERT(ZSTD_defaultDisallowed =3D=3D 0 && ZSTD_defaultAllo=
wed !=3D 0);
     if (mostFrequent =3D=3D nbSeq) {
         *repeatMode =3D FSE_repeat_none;
         if (isDefaultAllowed && nbSeq <=3D 2) {
-            /* Prefer set_basic over set_rle when there are 2 or less symb=
ols,
+            /* Prefer set_basic over set_rle when there are 2 or fewer sym=
bols,
              * since RLE uses 1 byte, but set_basic uses 5-6 bits per symb=
ol.
              * If basic encoding isn't possible, always choose RLE.
              */
@@ -241,7 +242,7 @@ typedef struct {
=20
 size_t
 ZSTD_buildCTable(void* dst, size_t dstCapacity,
-                FSE_CTable* nextCTable, U32 FSELog, symbolEncodingType_e t=
ype,
+                FSE_CTable* nextCTable, U32 FSELog, SymbolEncodingType_e t=
ype,
                 unsigned* count, U32 max,
                 const BYTE* codeTable, size_t nbSeq,
                 const S16* defaultNorm, U32 defaultNormLog, U32 defaultMax,
@@ -293,7 +294,7 @@ ZSTD_encodeSequences_body(
             FSE_CTable const* CTable_MatchLength, BYTE const* mlCodeTable,
             FSE_CTable const* CTable_OffsetBits, BYTE const* ofCodeTable,
             FSE_CTable const* CTable_LitLength, BYTE const* llCodeTable,
-            seqDef const* sequences, size_t nbSeq, int longOffsets)
+            SeqDef const* sequences, size_t nbSeq, int longOffsets)
 {
     BIT_CStream_t blockStream;
     FSE_CState_t  stateMatchLength;
@@ -387,7 +388,7 @@ ZSTD_encodeSequences_default(
             FSE_CTable const* CTable_MatchLength, BYTE const* mlCodeTable,
             FSE_CTable const* CTable_OffsetBits, BYTE const* ofCodeTable,
             FSE_CTable const* CTable_LitLength, BYTE const* llCodeTable,
-            seqDef const* sequences, size_t nbSeq, int longOffsets)
+            SeqDef const* sequences, size_t nbSeq, int longOffsets)
 {
     return ZSTD_encodeSequences_body(dst, dstCapacity,
                                     CTable_MatchLength, mlCodeTable,
@@ -405,7 +406,7 @@ ZSTD_encodeSequences_bmi2(
             FSE_CTable const* CTable_MatchLength, BYTE const* mlCodeTable,
             FSE_CTable const* CTable_OffsetBits, BYTE const* ofCodeTable,
             FSE_CTable const* CTable_LitLength, BYTE const* llCodeTable,
-            seqDef const* sequences, size_t nbSeq, int longOffsets)
+            SeqDef const* sequences, size_t nbSeq, int longOffsets)
 {
     return ZSTD_encodeSequences_body(dst, dstCapacity,
                                     CTable_MatchLength, mlCodeTable,
@@ -421,7 +422,7 @@ size_t ZSTD_encodeSequences(
             FSE_CTable const* CTable_MatchLength, BYTE const* mlCodeTable,
             FSE_CTable const* CTable_OffsetBits, BYTE const* ofCodeTable,
             FSE_CTable const* CTable_LitLength, BYTE const* llCodeTable,
-            seqDef const* sequences, size_t nbSeq, int longOffsets, int bm=
i2)
+            SeqDef const* sequences, size_t nbSeq, int longOffsets, int bm=
i2)
 {
     DEBUGLOG(5, "ZSTD_encodeSequences: dstCapacity =3D %u", (unsigned)dstC=
apacity);
 #if DYNAMIC_BMI2
diff --git a/lib/zstd/compress/zstd_compress_sequences.h b/lib/zstd/compres=
s/zstd_compress_sequences.h
index 7991364c2f71..14fdccb6547f 100644
--- a/lib/zstd/compress/zstd_compress_sequences.h
+++ b/lib/zstd/compress/zstd_compress_sequences.h
@@ -1,5 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -11,26 +12,27 @@
 #ifndef ZSTD_COMPRESS_SEQUENCES_H
 #define ZSTD_COMPRESS_SEQUENCES_H
=20
+#include "zstd_compress_internal.h" /* SeqDef */
 #include "../common/fse.h" /* FSE_repeat, FSE_CTable */
-#include "../common/zstd_internal.h" /* symbolEncodingType_e, ZSTD_strateg=
y */
+#include "../common/zstd_internal.h" /* SymbolEncodingType_e, ZSTD_strateg=
y */
=20
 typedef enum {
     ZSTD_defaultDisallowed =3D 0,
     ZSTD_defaultAllowed =3D 1
-} ZSTD_defaultPolicy_e;
+} ZSTD_DefaultPolicy_e;
=20
-symbolEncodingType_e
+SymbolEncodingType_e
 ZSTD_selectEncodingType(
         FSE_repeat* repeatMode, unsigned const* count, unsigned const max,
         size_t const mostFrequent, size_t nbSeq, unsigned const FSELog,
         FSE_CTable const* prevCTable,
         short const* defaultNorm, U32 defaultNormLog,
-        ZSTD_defaultPolicy_e const isDefaultAllowed,
+        ZSTD_DefaultPolicy_e const isDefaultAllowed,
         ZSTD_strategy const strategy);
=20
 size_t
 ZSTD_buildCTable(void* dst, size_t dstCapacity,
-                FSE_CTable* nextCTable, U32 FSELog, symbolEncodingType_e t=
ype,
+                FSE_CTable* nextCTable, U32 FSELog, SymbolEncodingType_e t=
ype,
                 unsigned* count, U32 max,
                 const BYTE* codeTable, size_t nbSeq,
                 const S16* defaultNorm, U32 defaultNormLog, U32 defaultMax,
@@ -42,7 +44,7 @@ size_t ZSTD_encodeSequences(
             FSE_CTable const* CTable_MatchLength, BYTE const* mlCodeTable,
             FSE_CTable const* CTable_OffsetBits, BYTE const* ofCodeTable,
             FSE_CTable const* CTable_LitLength, BYTE const* llCodeTable,
-            seqDef const* sequences, size_t nbSeq, int longOffsets, int bm=
i2);
+            SeqDef const* sequences, size_t nbSeq, int longOffsets, int bm=
i2);
=20
 size_t ZSTD_fseBitCost(
     FSE_CTable const* ctable,
diff --git a/lib/zstd/compress/zstd_compress_superblock.c b/lib/zstd/compre=
ss/zstd_compress_superblock.c
index 17d836cc84e8..dc12d64e935c 100644
--- a/lib/zstd/compress/zstd_compress_superblock.c
+++ b/lib/zstd/compress/zstd_compress_superblock.c
@@ -1,5 +1,6 @@
+// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -36,13 +37,14 @@
  *      If it is set_compressed, first sub-block's literals section will b=
e Treeless_Literals_Block
  *      and the following sub-blocks' literals sections will be Treeless_L=
iterals_Block.
  *  @return : compressed size of literals section of a sub-block
- *            Or 0 if it unable to compress.
+ *            Or 0 if unable to compress.
  *            Or error code */
-static size_t ZSTD_compressSubBlock_literal(const HUF_CElt* hufTable,
-                                    const ZSTD_hufCTablesMetadata_t* hufMe=
tadata,
-                                    const BYTE* literals, size_t litSize,
-                                    void* dst, size_t dstSize,
-                                    const int bmi2, int writeEntropy, int*=
 entropyWritten)
+static size_t
+ZSTD_compressSubBlock_literal(const HUF_CElt* hufTable,
+                              const ZSTD_hufCTablesMetadata_t* hufMetadata,
+                              const BYTE* literals, size_t litSize,
+                              void* dst, size_t dstSize,
+                              const int bmi2, int writeEntropy, int* entro=
pyWritten)
 {
     size_t const header =3D writeEntropy ? 200 : 0;
     size_t const lhSize =3D 3 + (litSize >=3D (1 KB - header)) + (litSize =
>=3D (16 KB - header));
@@ -50,11 +52,9 @@ static size_t ZSTD_compressSubBlock_literal(const HUF_CE=
lt* hufTable,
     BYTE* const oend =3D ostart + dstSize;
     BYTE* op =3D ostart + lhSize;
     U32 const singleStream =3D lhSize =3D=3D 3;
-    symbolEncodingType_e hType =3D writeEntropy ? hufMetadata->hType : set=
_repeat;
+    SymbolEncodingType_e hType =3D writeEntropy ? hufMetadata->hType : set=
_repeat;
     size_t cLitSize =3D 0;
=20
-    (void)bmi2; /* TODO bmi2... */
-
     DEBUGLOG(5, "ZSTD_compressSubBlock_literal (litSize=3D%zu, lhSize=3D%z=
u, writeEntropy=3D%d)", litSize, lhSize, writeEntropy);
=20
     *entropyWritten =3D 0;
@@ -76,9 +76,9 @@ static size_t ZSTD_compressSubBlock_literal(const HUF_CEl=
t* hufTable,
         DEBUGLOG(5, "ZSTD_compressSubBlock_literal (hSize=3D%zu)", hufMeta=
data->hufDesSize);
     }
=20
-    /* TODO bmi2 */
-    {   const size_t cSize =3D singleStream ? HUF_compress1X_usingCTable(o=
p, oend-op, literals, litSize, hufTable)
-                                          : HUF_compress4X_usingCTable(op,=
 oend-op, literals, litSize, hufTable);
+    {   int const flags =3D bmi2 ? HUF_flags_bmi2 : 0;
+        const size_t cSize =3D singleStream ? HUF_compress1X_usingCTable(o=
p, (size_t)(oend-op), literals, litSize, hufTable, flags)
+                                          : HUF_compress4X_usingCTable(op,=
 (size_t)(oend-op), literals, litSize, hufTable, flags);
         op +=3D cSize;
         cLitSize +=3D cSize;
         if (cSize =3D=3D 0 || ERR_isError(cSize)) {
@@ -103,7 +103,7 @@ static size_t ZSTD_compressSubBlock_literal(const HUF_C=
Elt* hufTable,
     switch(lhSize)
     {
     case 3: /* 2 - 2 - 10 - 10 */
-        {   U32 const lhc =3D hType + ((!singleStream) << 2) + ((U32)litSi=
ze<<4) + ((U32)cLitSize<<14);
+        {   U32 const lhc =3D hType + ((U32)(!singleStream) << 2) + ((U32)=
litSize<<4) + ((U32)cLitSize<<14);
             MEM_writeLE24(ostart, lhc);
             break;
         }
@@ -123,26 +123,30 @@ static size_t ZSTD_compressSubBlock_literal(const HUF=
_CElt* hufTable,
     }
     *entropyWritten =3D 1;
     DEBUGLOG(5, "Compressed literals: %u -> %u", (U32)litSize, (U32)(op-os=
tart));
-    return op-ostart;
+    return (size_t)(op-ostart);
 }
=20
-static size_t ZSTD_seqDecompressedSize(seqStore_t const* seqStore, const s=
eqDef* sequences, size_t nbSeq, size_t litSize, int lastSequence) {
-    const seqDef* const sstart =3D sequences;
-    const seqDef* const send =3D sequences + nbSeq;
-    const seqDef* sp =3D sstart;
+static size_t
+ZSTD_seqDecompressedSize(SeqStore_t const* seqStore,
+                   const SeqDef* sequences, size_t nbSeqs,
+                         size_t litSize, int lastSubBlock)
+{
     size_t matchLengthSum =3D 0;
     size_t litLengthSum =3D 0;
-    (void)(litLengthSum); /* suppress unused variable warning on some envi=
ronments */
-    while (send-sp > 0) {
-        ZSTD_sequenceLength const seqLen =3D ZSTD_getSequenceLength(seqSto=
re, sp);
+    size_t n;
+    for (n=3D0; n<nbSeqs; n++) {
+        const ZSTD_SequenceLength seqLen =3D ZSTD_getSequenceLength(seqSto=
re, sequences+n);
         litLengthSum +=3D seqLen.litLength;
         matchLengthSum +=3D seqLen.matchLength;
-        sp++;
     }
-    assert(litLengthSum <=3D litSize);
-    if (!lastSequence) {
+    DEBUGLOG(5, "ZSTD_seqDecompressedSize: %u sequences from %p: %u litera=
ls + %u matchlength",
+                (unsigned)nbSeqs, (const void*)sequences,
+                (unsigned)litLengthSum, (unsigned)matchLengthSum);
+    if (!lastSubBlock)
         assert(litLengthSum =3D=3D litSize);
-    }
+    else
+        assert(litLengthSum <=3D litSize);
+    (void)litLengthSum;
     return matchLengthSum + litSize;
 }
=20
@@ -156,13 +160,14 @@ static size_t ZSTD_seqDecompressedSize(seqStore_t con=
st* seqStore, const seqDef*
  *  @return : compressed size of sequences section of a sub-block
  *            Or 0 if it is unable to compress
  *            Or error code. */
-static size_t ZSTD_compressSubBlock_sequences(const ZSTD_fseCTables_t* fse=
Tables,
-                                              const ZSTD_fseCTablesMetadat=
a_t* fseMetadata,
-                                              const seqDef* sequences, siz=
e_t nbSeq,
-                                              const BYTE* llCode, const BY=
TE* mlCode, const BYTE* ofCode,
-                                              const ZSTD_CCtx_params* cctx=
Params,
-                                              void* dst, size_t dstCapacit=
y,
-                                              const int bmi2, int writeEnt=
ropy, int* entropyWritten)
+static size_t
+ZSTD_compressSubBlock_sequences(const ZSTD_fseCTables_t* fseTables,
+                                const ZSTD_fseCTablesMetadata_t* fseMetada=
ta,
+                                const SeqDef* sequences, size_t nbSeq,
+                                const BYTE* llCode, const BYTE* mlCode, co=
nst BYTE* ofCode,
+                                const ZSTD_CCtx_params* cctxParams,
+                                void* dst, size_t dstCapacity,
+                                const int bmi2, int writeEntropy, int* ent=
ropyWritten)
 {
     const int longOffsets =3D cctxParams->cParams.windowLog > STREAM_ACCUM=
ULATOR_MIN;
     BYTE* const ostart =3D (BYTE*)dst;
@@ -176,14 +181,14 @@ static size_t ZSTD_compressSubBlock_sequences(const Z=
STD_fseCTables_t* fseTables
     /* Sequences Header */
     RETURN_ERROR_IF((oend-op) < 3 /*max nbSeq Size*/ + 1 /*seqHead*/,
                     dstSize_tooSmall, "");
-    if (nbSeq < 0x7F)
+    if (nbSeq < 128)
         *op++ =3D (BYTE)nbSeq;
     else if (nbSeq < LONGNBSEQ)
         op[0] =3D (BYTE)((nbSeq>>8) + 0x80), op[1] =3D (BYTE)nbSeq, op+=3D=
2;
     else
         op[0]=3D0xFF, MEM_writeLE16(op+1, (U16)(nbSeq - LONGNBSEQ)), op+=
=3D3;
     if (nbSeq=3D=3D0) {
-        return op - ostart;
+        return (size_t)(op - ostart);
     }
=20
     /* seqHead : flags for FSE encoding type */
@@ -205,7 +210,7 @@ static size_t ZSTD_compressSubBlock_sequences(const ZST=
D_fseCTables_t* fseTables
     }
=20
     {   size_t const bitstreamSize =3D ZSTD_encodeSequences(
-                                        op, oend - op,
+                                        op, (size_t)(oend - op),
                                         fseTables->matchlengthCTable, mlCo=
de,
                                         fseTables->offcodeCTable, ofCode,
                                         fseTables->litlengthCTable, llCode,
@@ -249,7 +254,7 @@ static size_t ZSTD_compressSubBlock_sequences(const ZST=
D_fseCTables_t* fseTables
 #endif
=20
     *entropyWritten =3D 1;
-    return op - ostart;
+    return (size_t)(op - ostart);
 }
=20
 /* ZSTD_compressSubBlock() :
@@ -258,7 +263,7 @@ static size_t ZSTD_compressSubBlock_sequences(const ZST=
D_fseCTables_t* fseTables
  *            Or 0 if it failed to compress. */
 static size_t ZSTD_compressSubBlock(const ZSTD_entropyCTables_t* entropy,
                                     const ZSTD_entropyCTablesMetadata_t* e=
ntropyMetadata,
-                                    const seqDef* sequences, size_t nbSeq,
+                                    const SeqDef* sequences, size_t nbSeq,
                                     const BYTE* literals, size_t litSize,
                                     const BYTE* llCode, const BYTE* mlCode=
, const BYTE* ofCode,
                                     const ZSTD_CCtx_params* cctxParams,
@@ -275,7 +280,8 @@ static size_t ZSTD_compressSubBlock(const ZSTD_entropyC=
Tables_t* entropy,
                 litSize, nbSeq, writeLitEntropy, writeSeqEntropy, lastBloc=
k);
     {   size_t cLitSize =3D ZSTD_compressSubBlock_literal((const HUF_CElt*=
)entropy->huf.CTable,
                                                         &entropyMetadata->=
hufMetadata, literals, litSize,
-                                                        op, oend-op, bmi2,=
 writeLitEntropy, litEntropyWritten);
+                                                        op, (size_t)(oend-=
op),
+                                                        bmi2, writeLitEntr=
opy, litEntropyWritten);
         FORWARD_IF_ERROR(cLitSize, "ZSTD_compressSubBlock_literal failed");
         if (cLitSize =3D=3D 0) return 0;
         op +=3D cLitSize;
@@ -285,18 +291,18 @@ static size_t ZSTD_compressSubBlock(const ZSTD_entrop=
yCTables_t* entropy,
                                                   sequences, nbSeq,
                                                   llCode, mlCode, ofCode,
                                                   cctxParams,
-                                                  op, oend-op,
+                                                  op, (size_t)(oend-op),
                                                   bmi2, writeSeqEntropy, s=
eqEntropyWritten);
         FORWARD_IF_ERROR(cSeqSize, "ZSTD_compressSubBlock_sequences failed=
");
         if (cSeqSize =3D=3D 0) return 0;
         op +=3D cSeqSize;
     }
     /* Write block header */
-    {   size_t cSize =3D (op-ostart)-ZSTD_blockHeaderSize;
+    {   size_t cSize =3D (size_t)(op-ostart) - ZSTD_blockHeaderSize;
         U32 const cBlockHeader24 =3D lastBlock + (((U32)bt_compressed)<<1)=
 + (U32)(cSize << 3);
         MEM_writeLE24(ostart, cBlockHeader24);
     }
-    return op-ostart;
+    return (size_t)(op-ostart);
 }
=20
 static size_t ZSTD_estimateSubBlockSize_literal(const BYTE* literals, size=
_t litSize,
@@ -322,7 +328,7 @@ static size_t ZSTD_estimateSubBlockSize_literal(const B=
YTE* literals, size_t lit
     return 0;
 }
=20
-static size_t ZSTD_estimateSubBlockSize_symbolType(symbolEncodingType_e ty=
pe,
+static size_t ZSTD_estimateSubBlockSize_symbolType(SymbolEncodingType_e ty=
pe,
                         const BYTE* codeTable, unsigned maxCode,
                         size_t nbSeq, const FSE_CTable* fseCTable,
                         const U8* additionalBits,
@@ -385,7 +391,11 @@ static size_t ZSTD_estimateSubBlockSize_sequences(cons=
t BYTE* ofCodeTable,
     return cSeqSizeEstimate + sequencesSectionHeaderSize;
 }
=20
-static size_t ZSTD_estimateSubBlockSize(const BYTE* literals, size_t litSi=
ze,
+typedef struct {
+    size_t estLitSize;
+    size_t estBlockSize;
+} EstimatedBlockSize;
+static EstimatedBlockSize ZSTD_estimateSubBlockSize(const BYTE* literals, =
size_t litSize,
                                         const BYTE* ofCodeTable,
                                         const BYTE* llCodeTable,
                                         const BYTE* mlCodeTable,
@@ -393,15 +403,17 @@ static size_t ZSTD_estimateSubBlockSize(const BYTE* l=
iterals, size_t litSize,
                                         const ZSTD_entropyCTables_t* entro=
py,
                                         const ZSTD_entropyCTablesMetadata_=
t* entropyMetadata,
                                         void* workspace, size_t wkspSize,
-                                        int writeLitEntropy, int writeSeqE=
ntropy) {
-    size_t cSizeEstimate =3D 0;
-    cSizeEstimate +=3D ZSTD_estimateSubBlockSize_literal(literals, litSize,
-                                                         &entropy->huf, &e=
ntropyMetadata->hufMetadata,
-                                                         workspace, wkspSi=
ze, writeLitEntropy);
-    cSizeEstimate +=3D ZSTD_estimateSubBlockSize_sequences(ofCodeTable, ll=
CodeTable, mlCodeTable,
+                                        int writeLitEntropy, int writeSeqE=
ntropy)
+{
+    EstimatedBlockSize ebs;
+    ebs.estLitSize =3D ZSTD_estimateSubBlockSize_literal(literals, litSize,
+                                                        &entropy->huf, &en=
tropyMetadata->hufMetadata,
+                                                        workspace, wkspSiz=
e, writeLitEntropy);
+    ebs.estBlockSize =3D ZSTD_estimateSubBlockSize_sequences(ofCodeTable, =
llCodeTable, mlCodeTable,
                                                          nbSeq, &entropy->=
fse, &entropyMetadata->fseMetadata,
                                                          workspace, wkspSi=
ze, writeSeqEntropy);
-    return cSizeEstimate + ZSTD_blockHeaderSize;
+    ebs.estBlockSize +=3D ebs.estLitSize + ZSTD_blockHeaderSize;
+    return ebs;
 }
=20
 static int ZSTD_needSequenceEntropyTables(ZSTD_fseCTablesMetadata_t const*=
 fseMetadata)
@@ -415,14 +427,57 @@ static int ZSTD_needSequenceEntropyTables(ZSTD_fseCTa=
blesMetadata_t const* fseMe
     return 0;
 }
=20
+static size_t countLiterals(SeqStore_t const* seqStore, const SeqDef* sp, =
size_t seqCount)
+{
+    size_t n, total =3D 0;
+    assert(sp !=3D NULL);
+    for (n=3D0; n<seqCount; n++) {
+        total +=3D ZSTD_getSequenceLength(seqStore, sp+n).litLength;
+    }
+    DEBUGLOG(6, "countLiterals for %zu sequences from %p =3D> %zu bytes", =
seqCount, (const void*)sp, total);
+    return total;
+}
+
+#define BYTESCALE 256
+
+static size_t sizeBlockSequences(const SeqDef* sp, size_t nbSeqs,
+                size_t targetBudget, size_t avgLitCost, size_t avgSeqCost,
+                int firstSubBlock)
+{
+    size_t n, budget =3D 0, inSize=3D0;
+    /* entropy headers */
+    size_t const headerSize =3D (size_t)firstSubBlock * 120 * BYTESCALE; /=
* generous estimate */
+    assert(firstSubBlock=3D=3D0 || firstSubBlock=3D=3D1);
+    budget +=3D headerSize;
+
+    /* first sequence =3D> at least one sequence*/
+    budget +=3D sp[0].litLength * avgLitCost + avgSeqCost;
+    if (budget > targetBudget) return 1;
+    inSize =3D sp[0].litLength + (sp[0].mlBase+MINMATCH);
+
+    /* loop over sequences */
+    for (n=3D1; n<nbSeqs; n++) {
+        size_t currentCost =3D sp[n].litLength * avgLitCost + avgSeqCost;
+        budget +=3D currentCost;
+        inSize +=3D sp[n].litLength + (sp[n].mlBase+MINMATCH);
+        /* stop when sub-block budget is reached */
+        if ( (budget > targetBudget)
+            /* though continue to expand until the sub-block is deemed com=
pressible */
+          && (budget < inSize * BYTESCALE) )
+            break;
+    }
+
+    return n;
+}
+
 /* ZSTD_compressSubBlock_multi() :
  *  Breaks super-block into multiple sub-blocks and compresses them.
- *  Entropy will be written to the first block.
- *  The following blocks will use repeat mode to compress.
- *  All sub-blocks are compressed blocks (no raw or rle blocks).
- *  @return : compressed size of the super block (which is multiple ZSTD b=
locks)
- *            Or 0 if it failed to compress. */
-static size_t ZSTD_compressSubBlock_multi(const seqStore_t* seqStorePtr,
+ *  Entropy will be written into the first block.
+ *  The following blocks use repeat_mode to compress.
+ *  Sub-blocks are all compressed, except the last one when beneficial.
+ *  @return : compressed size of the super block (which features multiple =
ZSTD blocks)
+ *            or 0 if it failed to compress. */
+static size_t ZSTD_compressSubBlock_multi(const SeqStore_t* seqStorePtr,
                             const ZSTD_compressedBlockState_t* prevCBlock,
                             ZSTD_compressedBlockState_t* nextCBlock,
                             const ZSTD_entropyCTablesMetadata_t* entropyMe=
tadata,
@@ -432,12 +487,14 @@ static size_t ZSTD_compressSubBlock_multi(const seqSt=
ore_t* seqStorePtr,
                             const int bmi2, U32 lastBlock,
                             void* workspace, size_t wkspSize)
 {
-    const seqDef* const sstart =3D seqStorePtr->sequencesStart;
-    const seqDef* const send =3D seqStorePtr->sequences;
-    const seqDef* sp =3D sstart;
+    const SeqDef* const sstart =3D seqStorePtr->sequencesStart;
+    const SeqDef* const send =3D seqStorePtr->sequences;
+    const SeqDef* sp =3D sstart; /* tracks progresses within seqStorePtr->=
sequences */
+    size_t const nbSeqs =3D (size_t)(send - sstart);
     const BYTE* const lstart =3D seqStorePtr->litStart;
     const BYTE* const lend =3D seqStorePtr->lit;
     const BYTE* lp =3D lstart;
+    size_t const nbLiterals =3D (size_t)(lend - lstart);
     BYTE const* ip =3D (BYTE const*)src;
     BYTE const* const iend =3D ip + srcSize;
     BYTE* const ostart =3D (BYTE*)dst;
@@ -446,112 +503,171 @@ static size_t ZSTD_compressSubBlock_multi(const seq=
Store_t* seqStorePtr,
     const BYTE* llCodePtr =3D seqStorePtr->llCode;
     const BYTE* mlCodePtr =3D seqStorePtr->mlCode;
     const BYTE* ofCodePtr =3D seqStorePtr->ofCode;
-    size_t targetCBlockSize =3D cctxParams->targetCBlockSize;
-    size_t litSize, seqCount;
-    int writeLitEntropy =3D entropyMetadata->hufMetadata.hType =3D=3D set_=
compressed;
+    size_t const minTarget =3D ZSTD_TARGETCBLOCKSIZE_MIN; /* enforce minim=
um size, to reduce undesirable side effects */
+    size_t const targetCBlockSize =3D MAX(minTarget, cctxParams->targetCBl=
ockSize);
+    int writeLitEntropy =3D (entropyMetadata->hufMetadata.hType =3D=3D set=
_compressed);
     int writeSeqEntropy =3D 1;
-    int lastSequence =3D 0;
-
-    DEBUGLOG(5, "ZSTD_compressSubBlock_multi (litSize=3D%u, nbSeq=3D%u)",
-                (unsigned)(lend-lp), (unsigned)(send-sstart));
-
-    litSize =3D 0;
-    seqCount =3D 0;
-    do {
-        size_t cBlockSizeEstimate =3D 0;
-        if (sstart =3D=3D send) {
-            lastSequence =3D 1;
-        } else {
-            const seqDef* const sequence =3D sp + seqCount;
-            lastSequence =3D sequence =3D=3D send - 1;
-            litSize +=3D ZSTD_getSequenceLength(seqStorePtr, sequence).lit=
Length;
-            seqCount++;
-        }
-        if (lastSequence) {
-            assert(lp <=3D lend);
-            assert(litSize <=3D (size_t)(lend - lp));
-            litSize =3D (size_t)(lend - lp);
+
+    DEBUGLOG(5, "ZSTD_compressSubBlock_multi (srcSize=3D%u, litSize=3D%u, =
nbSeq=3D%u)",
+               (unsigned)srcSize, (unsigned)(lend-lstart), (unsigned)(send=
-sstart));
+
+        /* let's start by a general estimation for the full block */
+    if (nbSeqs > 0) {
+        EstimatedBlockSize const ebs =3D
+                ZSTD_estimateSubBlockSize(lp, nbLiterals,
+                                        ofCodePtr, llCodePtr, mlCodePtr, n=
bSeqs,
+                                        &nextCBlock->entropy, entropyMetad=
ata,
+                                        workspace, wkspSize,
+                                        writeLitEntropy, writeSeqEntropy);
+        /* quick estimation */
+        size_t const avgLitCost =3D nbLiterals ? (ebs.estLitSize * BYTESCA=
LE) / nbLiterals : BYTESCALE;
+        size_t const avgSeqCost =3D ((ebs.estBlockSize - ebs.estLitSize) *=
 BYTESCALE) / nbSeqs;
+        const size_t nbSubBlocks =3D MAX((ebs.estBlockSize + (targetCBlock=
Size/2)) / targetCBlockSize, 1);
+        size_t n, avgBlockBudget, blockBudgetSupp=3D0;
+        avgBlockBudget =3D (ebs.estBlockSize * BYTESCALE) / nbSubBlocks;
+        DEBUGLOG(5, "estimated fullblock size=3D%u bytes ; avgLitCost=3D%.=
2f ; avgSeqCost=3D%.2f ; targetCBlockSize=3D%u, nbSubBlocks=3D%u ; avgBlock=
Budget=3D%.0f bytes",
+                    (unsigned)ebs.estBlockSize, (double)avgLitCost/BYTESCA=
LE, (double)avgSeqCost/BYTESCALE,
+                    (unsigned)targetCBlockSize, (unsigned)nbSubBlocks, (do=
uble)avgBlockBudget/BYTESCALE);
+        /* simplification: if estimates states that the full superblock do=
esn't compress, just bail out immediately
+         * this will result in the production of a single uncompressed blo=
ck covering @srcSize.*/
+        if (ebs.estBlockSize > srcSize) return 0;
+
+        /* compress and write sub-blocks */
+        assert(nbSubBlocks>0);
+        for (n=3D0; n < nbSubBlocks-1; n++) {
+            /* determine nb of sequences for current sub-block + nbLiteral=
s from next sequence */
+            size_t const seqCount =3D sizeBlockSequences(sp, (size_t)(send=
-sp),
+                                        avgBlockBudget + blockBudgetSupp, =
avgLitCost, avgSeqCost, n=3D=3D0);
+            /* if reached last sequence : break to last sub-block (simplif=
ication) */
+            assert(seqCount <=3D (size_t)(send-sp));
+            if (sp + seqCount =3D=3D send) break;
+            assert(seqCount > 0);
+            /* compress sub-block */
+            {   int litEntropyWritten =3D 0;
+                int seqEntropyWritten =3D 0;
+                size_t litSize =3D countLiterals(seqStorePtr, sp, seqCount=
);
+                const size_t decompressedSize =3D
+                        ZSTD_seqDecompressedSize(seqStorePtr, sp, seqCount=
, litSize, 0);
+                size_t const cSize =3D ZSTD_compressSubBlock(&nextCBlock->=
entropy, entropyMetadata,
+                                                sp, seqCount,
+                                                lp, litSize,
+                                                llCodePtr, mlCodePtr, ofCo=
dePtr,
+                                                cctxParams,
+                                                op, (size_t)(oend-op),
+                                                bmi2, writeLitEntropy, wri=
teSeqEntropy,
+                                                &litEntropyWritten, &seqEn=
tropyWritten,
+                                                0);
+                FORWARD_IF_ERROR(cSize, "ZSTD_compressSubBlock failed");
+
+                /* check compressibility, update state components */
+                if (cSize > 0 && cSize < decompressedSize) {
+                    DEBUGLOG(5, "Committed sub-block compressing %u bytes =
=3D> %u bytes",
+                                (unsigned)decompressedSize, (unsigned)cSiz=
e);
+                    assert(ip + decompressedSize <=3D iend);
+                    ip +=3D decompressedSize;
+                    lp +=3D litSize;
+                    op +=3D cSize;
+                    llCodePtr +=3D seqCount;
+                    mlCodePtr +=3D seqCount;
+                    ofCodePtr +=3D seqCount;
+                    /* Entropy only needs to be written once */
+                    if (litEntropyWritten) {
+                        writeLitEntropy =3D 0;
+                    }
+                    if (seqEntropyWritten) {
+                        writeSeqEntropy =3D 0;
+                    }
+                    sp +=3D seqCount;
+                    blockBudgetSupp =3D 0;
+            }   }
+            /* otherwise : do not compress yet, coalesce current sub-block=
 with following one */
         }
-        /* I think there is an optimization opportunity here.
-         * Calling ZSTD_estimateSubBlockSize for every sequence can be was=
teful
-         * since it recalculates estimate from scratch.
-         * For example, it would recount literal distribution and symbol c=
odes every time.
-         */
-        cBlockSizeEstimate =3D ZSTD_estimateSubBlockSize(lp, litSize, ofCo=
dePtr, llCodePtr, mlCodePtr, seqCount,
-                                                       &nextCBlock->entrop=
y, entropyMetadata,
-                                                       workspace, wkspSize=
, writeLitEntropy, writeSeqEntropy);
-        if (cBlockSizeEstimate > targetCBlockSize || lastSequence) {
-            int litEntropyWritten =3D 0;
-            int seqEntropyWritten =3D 0;
-            const size_t decompressedSize =3D ZSTD_seqDecompressedSize(seq=
StorePtr, sp, seqCount, litSize, lastSequence);
-            const size_t cSize =3D ZSTD_compressSubBlock(&nextCBlock->entr=
opy, entropyMetadata,
-                                                       sp, seqCount,
-                                                       lp, litSize,
-                                                       llCodePtr, mlCodePt=
r, ofCodePtr,
-                                                       cctxParams,
-                                                       op, oend-op,
-                                                       bmi2, writeLitEntro=
py, writeSeqEntropy,
-                                                       &litEntropyWritten,=
 &seqEntropyWritten,
-                                                       lastBlock && lastSe=
quence);
-            FORWARD_IF_ERROR(cSize, "ZSTD_compressSubBlock failed");
-            if (cSize > 0 && cSize < decompressedSize) {
-                DEBUGLOG(5, "Committed the sub-block");
-                assert(ip + decompressedSize <=3D iend);
-                ip +=3D decompressedSize;
-                sp +=3D seqCount;
-                lp +=3D litSize;
-                op +=3D cSize;
-                llCodePtr +=3D seqCount;
-                mlCodePtr +=3D seqCount;
-                ofCodePtr +=3D seqCount;
-                litSize =3D 0;
-                seqCount =3D 0;
-                /* Entropy only needs to be written once */
-                if (litEntropyWritten) {
-                    writeLitEntropy =3D 0;
-                }
-                if (seqEntropyWritten) {
-                    writeSeqEntropy =3D 0;
-                }
+    } /* if (nbSeqs > 0) */
+
+    /* write last block */
+    DEBUGLOG(5, "Generate last sub-block: %u sequences remaining", (unsign=
ed)(send - sp));
+    {   int litEntropyWritten =3D 0;
+        int seqEntropyWritten =3D 0;
+        size_t litSize =3D (size_t)(lend - lp);
+        size_t seqCount =3D (size_t)(send - sp);
+        const size_t decompressedSize =3D
+                ZSTD_seqDecompressedSize(seqStorePtr, sp, seqCount, litSiz=
e, 1);
+        size_t const cSize =3D ZSTD_compressSubBlock(&nextCBlock->entropy,=
 entropyMetadata,
+                                            sp, seqCount,
+                                            lp, litSize,
+                                            llCodePtr, mlCodePtr, ofCodePt=
r,
+                                            cctxParams,
+                                            op, (size_t)(oend-op),
+                                            bmi2, writeLitEntropy, writeSe=
qEntropy,
+                                            &litEntropyWritten, &seqEntrop=
yWritten,
+                                            lastBlock);
+        FORWARD_IF_ERROR(cSize, "ZSTD_compressSubBlock failed");
+
+        /* update pointers, the nb of literals borrowed from next sequence=
 must be preserved */
+        if (cSize > 0 && cSize < decompressedSize) {
+            DEBUGLOG(5, "Last sub-block compressed %u bytes =3D> %u bytes",
+                        (unsigned)decompressedSize, (unsigned)cSize);
+            assert(ip + decompressedSize <=3D iend);
+            ip +=3D decompressedSize;
+            lp +=3D litSize;
+            op +=3D cSize;
+            llCodePtr +=3D seqCount;
+            mlCodePtr +=3D seqCount;
+            ofCodePtr +=3D seqCount;
+            /* Entropy only needs to be written once */
+            if (litEntropyWritten) {
+                writeLitEntropy =3D 0;
             }
+            if (seqEntropyWritten) {
+                writeSeqEntropy =3D 0;
+            }
+            sp +=3D seqCount;
         }
-    } while (!lastSequence);
+    }
+
+
     if (writeLitEntropy) {
-        DEBUGLOG(5, "ZSTD_compressSubBlock_multi has literal entropy table=
s unwritten");
+        DEBUGLOG(5, "Literal entropy tables were never written");
         ZSTD_memcpy(&nextCBlock->entropy.huf, &prevCBlock->entropy.huf, si=
zeof(prevCBlock->entropy.huf));
     }
     if (writeSeqEntropy && ZSTD_needSequenceEntropyTables(&entropyMetadata=
->fseMetadata)) {
         /* If we haven't written our entropy tables, then we've violated o=
ur contract and
          * must emit an uncompressed block.
          */
-        DEBUGLOG(5, "ZSTD_compressSubBlock_multi has sequence entropy tabl=
es unwritten");
+        DEBUGLOG(5, "Sequence entropy tables were never written =3D> cance=
l, emit an uncompressed block");
         return 0;
     }
+
     if (ip < iend) {
-        size_t const cSize =3D ZSTD_noCompressBlock(op, oend - op, ip, ien=
d - ip, lastBlock);
-        DEBUGLOG(5, "ZSTD_compressSubBlock_multi last sub-block uncompress=
ed, %zu bytes", (size_t)(iend - ip));
+        /* some data left : last part of the block sent uncompressed */
+        size_t const rSize =3D (size_t)((iend - ip));
+        size_t const cSize =3D ZSTD_noCompressBlock(op, (size_t)(oend - op=
), ip, rSize, lastBlock);
+        DEBUGLOG(5, "Generate last uncompressed sub-block of %u bytes", (u=
nsigned)(rSize));
         FORWARD_IF_ERROR(cSize, "ZSTD_noCompressBlock failed");
         assert(cSize !=3D 0);
         op +=3D cSize;
         /* We have to regenerate the repcodes because we've skipped some s=
equences */
         if (sp < send) {
-            seqDef const* seq;
-            repcodes_t rep;
+            const SeqDef* seq;
+            Repcodes_t rep;
             ZSTD_memcpy(&rep, prevCBlock->rep, sizeof(rep));
             for (seq =3D sstart; seq < sp; ++seq) {
-                ZSTD_updateRep(rep.rep, seq->offBase - 1, ZSTD_getSequence=
Length(seqStorePtr, seq).litLength =3D=3D 0);
+                ZSTD_updateRep(rep.rep, seq->offBase, ZSTD_getSequenceLeng=
th(seqStorePtr, seq).litLength =3D=3D 0);
             }
             ZSTD_memcpy(nextCBlock->rep, &rep, sizeof(rep));
         }
     }
-    DEBUGLOG(5, "ZSTD_compressSubBlock_multi compressed");
-    return op-ostart;
+
+    DEBUGLOG(5, "ZSTD_compressSubBlock_multi compressed all subBlocks: tot=
al compressed size =3D %u",
+                (unsigned)(op-ostart));
+    return (size_t)(op-ostart);
 }
=20
 size_t ZSTD_compressSuperBlock(ZSTD_CCtx* zc,
                                void* dst, size_t dstCapacity,
-                               void const* src, size_t srcSize,
-                               unsigned lastBlock) {
+                               const void* src, size_t srcSize,
+                               unsigned lastBlock)
+{
     ZSTD_entropyCTablesMetadata_t entropyMetadata;
=20
     FORWARD_IF_ERROR(ZSTD_buildBlockEntropyStats(&zc->seqStore,
@@ -559,7 +675,7 @@ size_t ZSTD_compressSuperBlock(ZSTD_CCtx* zc,
           &zc->blockState.nextCBlock->entropy,
           &zc->appliedParams,
           &entropyMetadata,
-          zc->entropyWorkspace, ENTROPY_WORKSPACE_SIZE /* statically alloc=
ated in resetCCtx */), "");
+          zc->tmpWorkspace, zc->tmpWkspSize /* statically allocated in res=
etCCtx */), "");
=20
     return ZSTD_compressSubBlock_multi(&zc->seqStore,
             zc->blockState.prevCBlock,
@@ -569,5 +685,5 @@ size_t ZSTD_compressSuperBlock(ZSTD_CCtx* zc,
             dst, dstCapacity,
             src, srcSize,
             zc->bmi2, lastBlock,
-            zc->entropyWorkspace, ENTROPY_WORKSPACE_SIZE /* statically all=
ocated in resetCCtx */);
+            zc->tmpWorkspace, zc->tmpWkspSize /* statically allocated in r=
esetCCtx */);
 }
diff --git a/lib/zstd/compress/zstd_compress_superblock.h b/lib/zstd/compre=
ss/zstd_compress_superblock.h
index 224ece79546e..826bbc9e029b 100644
--- a/lib/zstd/compress/zstd_compress_superblock.h
+++ b/lib/zstd/compress/zstd_compress_superblock.h
@@ -1,5 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
diff --git a/lib/zstd/compress/zstd_cwksp.h b/lib/zstd/compress/zstd_cwksp.h
index 349fc923c355..dce42f653bae 100644
--- a/lib/zstd/compress/zstd_cwksp.h
+++ b/lib/zstd/compress/zstd_cwksp.h
@@ -1,5 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -14,8 +15,10 @@
 /*-*************************************
 *  Dependencies
 ***************************************/
+#include "../common/allocations.h"  /* ZSTD_customMalloc, ZSTD_customFree =
*/
 #include "../common/zstd_internal.h"
-
+#include "../common/portability_macros.h"
+#include "../common/compiler.h" /* ZS2_isPower2 */
=20
 /*-*************************************
 *  Constants
@@ -41,8 +44,9 @@
 ***************************************/
 typedef enum {
     ZSTD_cwksp_alloc_objects,
-    ZSTD_cwksp_alloc_buffers,
-    ZSTD_cwksp_alloc_aligned
+    ZSTD_cwksp_alloc_aligned_init_once,
+    ZSTD_cwksp_alloc_aligned,
+    ZSTD_cwksp_alloc_buffers
 } ZSTD_cwksp_alloc_phase_e;
=20
 /*
@@ -95,8 +99,8 @@ typedef enum {
  *
  * Workspace Layout:
  *
- * [                        ... workspace ...                         ]
- * [objects][tables ... ->] free space [<- ... aligned][<- ... buffers]
+ * [                        ... workspace ...                           ]
+ * [objects][tables ->] free space [<- buffers][<- aligned][<- init once]
  *
  * The various objects that live in the workspace are divided into the
  * following categories, and are allocated separately:
@@ -120,9 +124,18 @@ typedef enum {
  *   uint32_t arrays, all of whose values are between 0 and (nextSrc - bas=
e).
  *   Their sizes depend on the cparams. These tables are 64-byte aligned.
  *
- * - Aligned: these buffers are used for various purposes that require 4 b=
yte
- *   alignment, but don't require any initialization before they're used. =
These
- *   buffers are each aligned to 64 bytes.
+ * - Init once: these buffers require to be initialized at least once befo=
re
+ *   use. They should be used when we want to skip memory initialization
+ *   while not triggering memory checkers (like Valgrind) when reading from
+ *   from this memory without writing to it first.
+ *   These buffers should be used carefully as they might contain data
+ *   from previous compressions.
+ *   Buffers are aligned to 64 bytes.
+ *
+ * - Aligned: these buffers don't require any initialization before they're
+ *   used. The user of the buffer should make sure they write into a buffer
+ *   location before reading from it.
+ *   Buffers are aligned to 64 bytes.
  *
  * - Buffers: these buffers are used for various purposes that don't requi=
re
  *   any alignment or initialization before they're used. This means they =
can
@@ -134,8 +147,9 @@ typedef enum {
  * correctly packed into the workspace buffer. That order is:
  *
  * 1. Objects
- * 2. Buffers
- * 3. Aligned/Tables
+ * 2. Init once / Tables
+ * 3. Aligned / Tables
+ * 4. Buffers / Tables
  *
  * Attempts to reserve objects of different types out of order will fail.
  */
@@ -147,6 +161,7 @@ typedef struct {
     void* tableEnd;
     void* tableValidEnd;
     void* allocStart;
+    void* initOnceStart;
=20
     BYTE allocFailed;
     int workspaceOversizedDuration;
@@ -159,6 +174,7 @@ typedef struct {
 ***************************************/
=20
 MEM_STATIC size_t ZSTD_cwksp_available_space(ZSTD_cwksp* ws);
+MEM_STATIC void*  ZSTD_cwksp_initialAllocStart(ZSTD_cwksp* ws);
=20
 MEM_STATIC void ZSTD_cwksp_assert_internal_consistency(ZSTD_cwksp* ws) {
     (void)ws;
@@ -168,14 +184,16 @@ MEM_STATIC void ZSTD_cwksp_assert_internal_consistenc=
y(ZSTD_cwksp* ws) {
     assert(ws->tableEnd <=3D ws->allocStart);
     assert(ws->tableValidEnd <=3D ws->allocStart);
     assert(ws->allocStart <=3D ws->workspaceEnd);
+    assert(ws->initOnceStart <=3D ZSTD_cwksp_initialAllocStart(ws));
+    assert(ws->workspace <=3D ws->initOnceStart);
 }
=20
 /*
  * Align must be a power of 2.
  */
-MEM_STATIC size_t ZSTD_cwksp_align(size_t size, size_t const align) {
+MEM_STATIC size_t ZSTD_cwksp_align(size_t size, size_t align) {
     size_t const mask =3D align - 1;
-    assert((align & mask) =3D=3D 0);
+    assert(ZSTD_isPower2(align));
     return (size + mask) & ~mask;
 }
=20
@@ -189,7 +207,7 @@ MEM_STATIC size_t ZSTD_cwksp_align(size_t size, size_t =
const align) {
  * to figure out how much space you need for the matchState tables. Everyt=
hing
  * else is though.
  *
- * Do not use for sizing aligned buffers. Instead, use ZSTD_cwksp_aligned_=
alloc_size().
+ * Do not use for sizing aligned buffers. Instead, use ZSTD_cwksp_aligned6=
4_alloc_size().
  */
 MEM_STATIC size_t ZSTD_cwksp_alloc_size(size_t size) {
     if (size =3D=3D 0)
@@ -197,12 +215,16 @@ MEM_STATIC size_t ZSTD_cwksp_alloc_size(size_t size) {
     return size;
 }
=20
+MEM_STATIC size_t ZSTD_cwksp_aligned_alloc_size(size_t size, size_t alignm=
ent) {
+    return ZSTD_cwksp_alloc_size(ZSTD_cwksp_align(size, alignment));
+}
+
 /*
  * Returns an adjusted alloc size that is the nearest larger multiple of 6=
4 bytes.
  * Used to determine the number of bytes required for a given "aligned".
  */
-MEM_STATIC size_t ZSTD_cwksp_aligned_alloc_size(size_t size) {
-    return ZSTD_cwksp_alloc_size(ZSTD_cwksp_align(size, ZSTD_CWKSP_ALIGNME=
NT_BYTES));
+MEM_STATIC size_t ZSTD_cwksp_aligned64_alloc_size(size_t size) {
+    return ZSTD_cwksp_aligned_alloc_size(size, ZSTD_CWKSP_ALIGNMENT_BYTES);
 }
=20
 /*
@@ -210,14 +232,10 @@ MEM_STATIC size_t ZSTD_cwksp_aligned_alloc_size(size_=
t size) {
  * for internal purposes (currently only alignment).
  */
 MEM_STATIC size_t ZSTD_cwksp_slack_space_required(void) {
-    /* For alignment, the wksp will always allocate an additional n_1=3D[1=
, 64] bytes
-     * to align the beginning of tables section, as well as another n_2=3D=
[0, 63] bytes
-     * to align the beginning of the aligned section.
-     *
-     * n_1 + n_2 =3D=3D 64 bytes if the cwksp is freshly allocated, due to=
 tables and
-     * aligneds being sized in multiples of 64 bytes.
+    /* For alignment, the wksp will always allocate an additional 2*ZSTD_C=
WKSP_ALIGNMENT_BYTES
+     * bytes to align the beginning of tables section and end of buffers;
      */
-    size_t const slackSpace =3D ZSTD_CWKSP_ALIGNMENT_BYTES;
+    size_t const slackSpace =3D ZSTD_CWKSP_ALIGNMENT_BYTES * 2;
     return slackSpace;
 }
=20
@@ -229,11 +247,23 @@ MEM_STATIC size_t ZSTD_cwksp_slack_space_required(voi=
d) {
 MEM_STATIC size_t ZSTD_cwksp_bytes_to_align_ptr(void* ptr, const size_t al=
ignBytes) {
     size_t const alignBytesMask =3D alignBytes - 1;
     size_t const bytes =3D (alignBytes - ((size_t)ptr & (alignBytesMask)))=
 & alignBytesMask;
-    assert((alignBytes & alignBytesMask) =3D=3D 0);
-    assert(bytes !=3D ZSTD_CWKSP_ALIGNMENT_BYTES);
+    assert(ZSTD_isPower2(alignBytes));
+    assert(bytes < alignBytes);
     return bytes;
 }
=20
+/*
+ * Returns the initial value for allocStart which is used to determine the=
 position from
+ * which we can allocate from the end of the workspace.
+ */
+MEM_STATIC void*  ZSTD_cwksp_initialAllocStart(ZSTD_cwksp* ws)
+{
+    char* endPtr =3D (char*)ws->workspaceEnd;
+    assert(ZSTD_isPower2(ZSTD_CWKSP_ALIGNMENT_BYTES));
+    endPtr =3D endPtr - ((size_t)endPtr % ZSTD_CWKSP_ALIGNMENT_BYTES);
+    return (void*)endPtr;
+}
+
 /*
  * Internal function. Do not use directly.
  * Reserves the given number of bytes within the aligned/buffer segment of=
 the wksp,
@@ -246,7 +276,7 @@ ZSTD_cwksp_reserve_internal_buffer_space(ZSTD_cwksp* ws=
, size_t const bytes)
 {
     void* const alloc =3D (BYTE*)ws->allocStart - bytes;
     void* const bottom =3D ws->tableEnd;
-    DEBUGLOG(5, "cwksp: reserving %p %zd bytes, %zd bytes remaining",
+    DEBUGLOG(5, "cwksp: reserving [0x%p]:%zd bytes; %zd bytes remaining",
         alloc, bytes, ZSTD_cwksp_available_space(ws) - bytes);
     ZSTD_cwksp_assert_internal_consistency(ws);
     assert(alloc >=3D bottom);
@@ -274,27 +304,16 @@ ZSTD_cwksp_internal_advance_phase(ZSTD_cwksp* ws, ZST=
D_cwksp_alloc_phase_e phase
 {
     assert(phase >=3D ws->phase);
     if (phase > ws->phase) {
-        /* Going from allocating objects to allocating buffers */
-        if (ws->phase < ZSTD_cwksp_alloc_buffers &&
-                phase >=3D ZSTD_cwksp_alloc_buffers) {
+        /* Going from allocating objects to allocating initOnce / tables */
+        if (ws->phase < ZSTD_cwksp_alloc_aligned_init_once &&
+            phase >=3D ZSTD_cwksp_alloc_aligned_init_once) {
             ws->tableValidEnd =3D ws->objectEnd;
-        }
+            ws->initOnceStart =3D ZSTD_cwksp_initialAllocStart(ws);
=20
-        /* Going from allocating buffers to allocating aligneds/tables */
-        if (ws->phase < ZSTD_cwksp_alloc_aligned &&
-                phase >=3D ZSTD_cwksp_alloc_aligned) {
-            {   /* Align the start of the "aligned" to 64 bytes. Use [1, 6=
4] bytes. */
-                size_t const bytesToAlign =3D
-                    ZSTD_CWKSP_ALIGNMENT_BYTES - ZSTD_cwksp_bytes_to_align=
_ptr(ws->allocStart, ZSTD_CWKSP_ALIGNMENT_BYTES);
-                DEBUGLOG(5, "reserving aligned alignment addtl space: %zu"=
, bytesToAlign);
-                ZSTD_STATIC_ASSERT((ZSTD_CWKSP_ALIGNMENT_BYTES & (ZSTD_CWK=
SP_ALIGNMENT_BYTES - 1)) =3D=3D 0); /* power of 2 */
-                RETURN_ERROR_IF(!ZSTD_cwksp_reserve_internal_buffer_space(=
ws, bytesToAlign),
-                                memory_allocation, "aligned phase - alignm=
ent initial allocation failed!");
-            }
             {   /* Align the start of the tables to 64 bytes. Use [0, 63] =
bytes */
-                void* const alloc =3D ws->objectEnd;
+                void *const alloc =3D ws->objectEnd;
                 size_t const bytesToAlign =3D ZSTD_cwksp_bytes_to_align_pt=
r(alloc, ZSTD_CWKSP_ALIGNMENT_BYTES);
-                void* const objectEnd =3D (BYTE*)alloc + bytesToAlign;
+                void *const objectEnd =3D (BYTE *) alloc + bytesToAlign;
                 DEBUGLOG(5, "reserving table alignment addtl space: %zu", =
bytesToAlign);
                 RETURN_ERROR_IF(objectEnd > ws->workspaceEnd, memory_alloc=
ation,
                                 "table phase - alignment initial allocatio=
n failed!");
@@ -302,7 +321,9 @@ ZSTD_cwksp_internal_advance_phase(ZSTD_cwksp* ws, ZSTD_=
cwksp_alloc_phase_e phase
                 ws->tableEnd =3D objectEnd;  /* table area starts being em=
pty */
                 if (ws->tableValidEnd < ws->tableEnd) {
                     ws->tableValidEnd =3D ws->tableEnd;
-        }   }   }
+                }
+            }
+        }
         ws->phase =3D phase;
         ZSTD_cwksp_assert_internal_consistency(ws);
     }
@@ -314,7 +335,7 @@ ZSTD_cwksp_internal_advance_phase(ZSTD_cwksp* ws, ZSTD_=
cwksp_alloc_phase_e phase
  */
 MEM_STATIC int ZSTD_cwksp_owns_buffer(const ZSTD_cwksp* ws, const void* pt=
r)
 {
-    return (ptr !=3D NULL) && (ws->workspace <=3D ptr) && (ptr <=3D ws->wo=
rkspaceEnd);
+    return (ptr !=3D NULL) && (ws->workspace <=3D ptr) && (ptr < ws->works=
paceEnd);
 }
=20
 /*
@@ -345,29 +366,61 @@ MEM_STATIC BYTE* ZSTD_cwksp_reserve_buffer(ZSTD_cwksp=
* ws, size_t bytes)
=20
 /*
  * Reserves and returns memory sized on and aligned on ZSTD_CWKSP_ALIGNMEN=
T_BYTES (64 bytes).
+ * This memory has been initialized at least once in the past.
+ * This doesn't mean it has been initialized this time, and it might conta=
in data from previous
+ * operations.
+ * The main usage is for algorithms that might need read access into unini=
tialized memory.
+ * The algorithm must maintain safety under these conditions and must make=
 sure it doesn't
+ * leak any of the past data (directly or in side channels).
  */
-MEM_STATIC void* ZSTD_cwksp_reserve_aligned(ZSTD_cwksp* ws, size_t bytes)
+MEM_STATIC void* ZSTD_cwksp_reserve_aligned_init_once(ZSTD_cwksp* ws, size=
_t bytes)
 {
-    void* ptr =3D ZSTD_cwksp_reserve_internal(ws, ZSTD_cwksp_align(bytes, =
ZSTD_CWKSP_ALIGNMENT_BYTES),
-                                            ZSTD_cwksp_alloc_aligned);
-    assert(((size_t)ptr & (ZSTD_CWKSP_ALIGNMENT_BYTES-1))=3D=3D 0);
+    size_t const alignedBytes =3D ZSTD_cwksp_align(bytes, ZSTD_CWKSP_ALIGN=
MENT_BYTES);
+    void* ptr =3D ZSTD_cwksp_reserve_internal(ws, alignedBytes, ZSTD_cwksp=
_alloc_aligned_init_once);
+    assert(((size_t)ptr & (ZSTD_CWKSP_ALIGNMENT_BYTES-1)) =3D=3D 0);
+    if(ptr && ptr < ws->initOnceStart) {
+        /* We assume the memory following the current allocation is either:
+         * 1. Not usable as initOnce memory (end of workspace)
+         * 2. Another initOnce buffer that has been allocated before (and =
so was previously memset)
+         * 3. An ASAN redzone, in which case we don't want to write on it
+         * For these reasons it should be fine to not explicitly zero ever=
y byte up to ws->initOnceStart.
+         * Note that we assume here that MSAN and ASAN cannot run in the s=
ame time. */
+        ZSTD_memset(ptr, 0, MIN((size_t)((U8*)ws->initOnceStart - (U8*)ptr=
), alignedBytes));
+        ws->initOnceStart =3D ptr;
+    }
+    return ptr;
+}
+
+/*
+ * Reserves and returns memory sized on and aligned on ZSTD_CWKSP_ALIGNMEN=
T_BYTES (64 bytes).
+ */
+MEM_STATIC void* ZSTD_cwksp_reserve_aligned64(ZSTD_cwksp* ws, size_t bytes)
+{
+    void* const ptr =3D ZSTD_cwksp_reserve_internal(ws,
+                        ZSTD_cwksp_align(bytes, ZSTD_CWKSP_ALIGNMENT_BYTES=
),
+                        ZSTD_cwksp_alloc_aligned);
+    assert(((size_t)ptr & (ZSTD_CWKSP_ALIGNMENT_BYTES-1)) =3D=3D 0);
     return ptr;
 }
=20
 /*
  * Aligned on 64 bytes. These buffers have the special property that
- * their values remain constrained, allowing us to re-use them without
+ * their values remain constrained, allowing us to reuse them without
  * memset()-ing them.
  */
 MEM_STATIC void* ZSTD_cwksp_reserve_table(ZSTD_cwksp* ws, size_t bytes)
 {
-    const ZSTD_cwksp_alloc_phase_e phase =3D ZSTD_cwksp_alloc_aligned;
+    const ZSTD_cwksp_alloc_phase_e phase =3D ZSTD_cwksp_alloc_aligned_init=
_once;
     void* alloc;
     void* end;
     void* top;
=20
-    if (ZSTD_isError(ZSTD_cwksp_internal_advance_phase(ws, phase))) {
-        return NULL;
+    /* We can only start allocating tables after we are done reserving spa=
ce for objects at the
+     * start of the workspace */
+    if(ws->phase < phase) {
+        if (ZSTD_isError(ZSTD_cwksp_internal_advance_phase(ws, phase))) {
+            return NULL;
+        }
     }
     alloc =3D ws->tableEnd;
     end =3D (BYTE *)alloc + bytes;
@@ -387,7 +440,7 @@ MEM_STATIC void* ZSTD_cwksp_reserve_table(ZSTD_cwksp* w=
s, size_t bytes)
=20
=20
     assert((bytes & (ZSTD_CWKSP_ALIGNMENT_BYTES-1)) =3D=3D 0);
-    assert(((size_t)alloc & (ZSTD_CWKSP_ALIGNMENT_BYTES-1))=3D=3D 0);
+    assert(((size_t)alloc & (ZSTD_CWKSP_ALIGNMENT_BYTES-1)) =3D=3D 0);
     return alloc;
 }
=20
@@ -421,6 +474,20 @@ MEM_STATIC void* ZSTD_cwksp_reserve_object(ZSTD_cwksp*=
 ws, size_t bytes)
=20
     return alloc;
 }
+/*
+ * with alignment control
+ * Note : should happen only once, at workspace first initialization
+ */
+MEM_STATIC void* ZSTD_cwksp_reserve_object_aligned(ZSTD_cwksp* ws, size_t =
byteSize, size_t alignment)
+{
+    size_t const mask =3D alignment - 1;
+    size_t const surplus =3D (alignment > sizeof(void*)) ? alignment - siz=
eof(void*) : 0;
+    void* const start =3D ZSTD_cwksp_reserve_object(ws, byteSize + surplus=
);
+    if (start =3D=3D NULL) return NULL;
+    if (surplus =3D=3D 0) return start;
+    assert(ZSTD_isPower2(alignment));
+    return (void*)(((size_t)start + surplus) & ~mask);
+}
=20
 MEM_STATIC void ZSTD_cwksp_mark_tables_dirty(ZSTD_cwksp* ws)
 {
@@ -451,7 +518,7 @@ MEM_STATIC void ZSTD_cwksp_clean_tables(ZSTD_cwksp* ws)=
 {
     assert(ws->tableValidEnd >=3D ws->objectEnd);
     assert(ws->tableValidEnd <=3D ws->allocStart);
     if (ws->tableValidEnd < ws->tableEnd) {
-        ZSTD_memset(ws->tableValidEnd, 0, (BYTE*)ws->tableEnd - (BYTE*)ws-=
>tableValidEnd);
+        ZSTD_memset(ws->tableValidEnd, 0, (size_t)((BYTE*)ws->tableEnd - (=
BYTE*)ws->tableValidEnd));
     }
     ZSTD_cwksp_mark_tables_clean(ws);
 }
@@ -460,7 +527,8 @@ MEM_STATIC void ZSTD_cwksp_clean_tables(ZSTD_cwksp* ws)=
 {
  * Invalidates table allocations.
  * All other allocations remain valid.
  */
-MEM_STATIC void ZSTD_cwksp_clear_tables(ZSTD_cwksp* ws) {
+MEM_STATIC void ZSTD_cwksp_clear_tables(ZSTD_cwksp* ws)
+{
     DEBUGLOG(4, "cwksp: clearing tables!");
=20
=20
@@ -478,14 +546,23 @@ MEM_STATIC void ZSTD_cwksp_clear(ZSTD_cwksp* ws) {
=20
=20
     ws->tableEnd =3D ws->objectEnd;
-    ws->allocStart =3D ws->workspaceEnd;
+    ws->allocStart =3D ZSTD_cwksp_initialAllocStart(ws);
     ws->allocFailed =3D 0;
-    if (ws->phase > ZSTD_cwksp_alloc_buffers) {
-        ws->phase =3D ZSTD_cwksp_alloc_buffers;
+    if (ws->phase > ZSTD_cwksp_alloc_aligned_init_once) {
+        ws->phase =3D ZSTD_cwksp_alloc_aligned_init_once;
     }
     ZSTD_cwksp_assert_internal_consistency(ws);
 }
=20
+MEM_STATIC size_t ZSTD_cwksp_sizeof(const ZSTD_cwksp* ws) {
+    return (size_t)((BYTE*)ws->workspaceEnd - (BYTE*)ws->workspace);
+}
+
+MEM_STATIC size_t ZSTD_cwksp_used(const ZSTD_cwksp* ws) {
+    return (size_t)((BYTE*)ws->tableEnd - (BYTE*)ws->workspace)
+         + (size_t)((BYTE*)ws->workspaceEnd - (BYTE*)ws->allocStart);
+}
+
 /*
  * The provided workspace takes ownership of the buffer [start, start+size=
).
  * Any existing values in the workspace are ignored (the previously managed
@@ -498,6 +575,7 @@ MEM_STATIC void ZSTD_cwksp_init(ZSTD_cwksp* ws, void* s=
tart, size_t size, ZSTD_c
     ws->workspaceEnd =3D (BYTE*)start + size;
     ws->objectEnd =3D ws->workspace;
     ws->tableValidEnd =3D ws->objectEnd;
+    ws->initOnceStart =3D ZSTD_cwksp_initialAllocStart(ws);
     ws->phase =3D ZSTD_cwksp_alloc_objects;
     ws->isStatic =3D isStatic;
     ZSTD_cwksp_clear(ws);
@@ -529,15 +607,6 @@ MEM_STATIC void ZSTD_cwksp_move(ZSTD_cwksp* dst, ZSTD_=
cwksp* src) {
     ZSTD_memset(src, 0, sizeof(ZSTD_cwksp));
 }
=20
-MEM_STATIC size_t ZSTD_cwksp_sizeof(const ZSTD_cwksp* ws) {
-    return (size_t)((BYTE*)ws->workspaceEnd - (BYTE*)ws->workspace);
-}
-
-MEM_STATIC size_t ZSTD_cwksp_used(const ZSTD_cwksp* ws) {
-    return (size_t)((BYTE*)ws->tableEnd - (BYTE*)ws->workspace)
-         + (size_t)((BYTE*)ws->workspaceEnd - (BYTE*)ws->allocStart);
-}
-
 MEM_STATIC int ZSTD_cwksp_reserve_failed(const ZSTD_cwksp* ws) {
     return ws->allocFailed;
 }
@@ -550,17 +619,11 @@ MEM_STATIC int ZSTD_cwksp_reserve_failed(const ZSTD_c=
wksp* ws) {
  * Returns if the estimated space needed for a wksp is within an acceptabl=
e limit of the
  * actual amount of space used.
  */
-MEM_STATIC int ZSTD_cwksp_estimated_space_within_bounds(const ZSTD_cwksp* =
const ws,
-                                                        size_t const estim=
atedSpace, int resizedWorkspace) {
-    if (resizedWorkspace) {
-        /* Resized/newly allocated wksp should have exact bounds */
-        return ZSTD_cwksp_used(ws) =3D=3D estimatedSpace;
-    } else {
-        /* Due to alignment, when reusing a workspace, we can actually con=
sume 63 fewer or more bytes
-         * than estimatedSpace. See the comments in zstd_cwksp.h for detai=
ls.
-         */
-        return (ZSTD_cwksp_used(ws) >=3D estimatedSpace - 63) && (ZSTD_cwk=
sp_used(ws) <=3D estimatedSpace + 63);
-    }
+MEM_STATIC int ZSTD_cwksp_estimated_space_within_bounds(const ZSTD_cwksp *=
const ws, size_t const estimatedSpace) {
+    /* We have an alignment space between objects and tables between table=
s and buffers, so we can have up to twice
+     * the alignment bytes difference between estimation and actual usage =
*/
+    return (estimatedSpace - ZSTD_cwksp_slack_space_required()) <=3D ZSTD_=
cwksp_used(ws) &&
+           ZSTD_cwksp_used(ws) <=3D estimatedSpace;
 }
=20
=20
@@ -591,5 +654,4 @@ MEM_STATIC void ZSTD_cwksp_bump_oversized_duration(
     }
 }
=20
-
 #endif /* ZSTD_CWKSP_H */
diff --git a/lib/zstd/compress/zstd_double_fast.c b/lib/zstd/compress/zstd_=
double_fast.c
index 76933dea2624..995e83f3a183 100644
--- a/lib/zstd/compress/zstd_double_fast.c
+++ b/lib/zstd/compress/zstd_double_fast.c
@@ -1,5 +1,6 @@
+// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -11,8 +12,49 @@
 #include "zstd_compress_internal.h"
 #include "zstd_double_fast.h"
=20
+#ifndef ZSTD_EXCLUDE_DFAST_BLOCK_COMPRESSOR
=20
-void ZSTD_fillDoubleHashTable(ZSTD_matchState_t* ms,
+static
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
+void ZSTD_fillDoubleHashTableForCDict(ZSTD_MatchState_t* ms,
+                              void const* end, ZSTD_dictTableLoadMethod_e =
dtlm)
+{
+    const ZSTD_compressionParameters* const cParams =3D &ms->cParams;
+    U32* const hashLarge =3D ms->hashTable;
+    U32  const hBitsL =3D cParams->hashLog + ZSTD_SHORT_CACHE_TAG_BITS;
+    U32  const mls =3D cParams->minMatch;
+    U32* const hashSmall =3D ms->chainTable;
+    U32  const hBitsS =3D cParams->chainLog + ZSTD_SHORT_CACHE_TAG_BITS;
+    const BYTE* const base =3D ms->window.base;
+    const BYTE* ip =3D base + ms->nextToUpdate;
+    const BYTE* const iend =3D ((const BYTE*)end) - HASH_READ_SIZE;
+    const U32 fastHashFillStep =3D 3;
+
+    /* Always insert every fastHashFillStep position into the hash tables.
+     * Insert the other positions into the large hash table if their entry
+     * is empty.
+     */
+    for (; ip + fastHashFillStep - 1 <=3D iend; ip +=3D fastHashFillStep) {
+        U32 const curr =3D (U32)(ip - base);
+        U32 i;
+        for (i =3D 0; i < fastHashFillStep; ++i) {
+            size_t const smHashAndTag =3D ZSTD_hashPtr(ip + i, hBitsS, mls=
);
+            size_t const lgHashAndTag =3D ZSTD_hashPtr(ip + i, hBitsL, 8);
+            if (i =3D=3D 0) {
+                ZSTD_writeTaggedIndex(hashSmall, smHashAndTag, curr + i);
+            }
+            if (i =3D=3D 0 || hashLarge[lgHashAndTag >> ZSTD_SHORT_CACHE_T=
AG_BITS] =3D=3D 0) {
+                ZSTD_writeTaggedIndex(hashLarge, lgHashAndTag, curr + i);
+            }
+            /* Only load extra positions for ZSTD_dtlm_full */
+            if (dtlm =3D=3D ZSTD_dtlm_fast)
+                break;
+    }   }
+}
+
+static
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
+void ZSTD_fillDoubleHashTableForCCtx(ZSTD_MatchState_t* ms,
                               void const* end, ZSTD_dictTableLoadMethod_e =
dtlm)
 {
     const ZSTD_compressionParameters* const cParams =3D &ms->cParams;
@@ -43,13 +85,26 @@ void ZSTD_fillDoubleHashTable(ZSTD_matchState_t* ms,
             /* Only load extra positions for ZSTD_dtlm_full */
             if (dtlm =3D=3D ZSTD_dtlm_fast)
                 break;
-    }   }
+        }   }
+}
+
+void ZSTD_fillDoubleHashTable(ZSTD_MatchState_t* ms,
+                        const void* const end,
+                        ZSTD_dictTableLoadMethod_e dtlm,
+                        ZSTD_tableFillPurpose_e tfp)
+{
+    if (tfp =3D=3D ZSTD_tfp_forCDict) {
+        ZSTD_fillDoubleHashTableForCDict(ms, end, dtlm);
+    } else {
+        ZSTD_fillDoubleHashTableForCCtx(ms, end, dtlm);
+    }
 }
=20
=20
 FORCE_INLINE_TEMPLATE
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
 size_t ZSTD_compressBlock_doubleFast_noDict_generic(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize, U32 const mls /* template */)
 {
     ZSTD_compressionParameters const* cParams =3D &ms->cParams;
@@ -67,7 +122,7 @@ size_t ZSTD_compressBlock_doubleFast_noDict_generic(
     const BYTE* const iend =3D istart + srcSize;
     const BYTE* const ilimit =3D iend - HASH_READ_SIZE;
     U32 offset_1=3Drep[0], offset_2=3Drep[1];
-    U32 offsetSaved =3D 0;
+    U32 offsetSaved1 =3D 0, offsetSaved2 =3D 0;
=20
     size_t mLength;
     U32 offset;
@@ -88,9 +143,14 @@ size_t ZSTD_compressBlock_doubleFast_noDict_generic(
     const BYTE* matchl0; /* the long match for ip */
     const BYTE* matchs0; /* the short match for ip */
     const BYTE* matchl1; /* the long match for ip1 */
+    const BYTE* matchs0_safe; /* matchs0 or safe address */
=20
     const BYTE* ip =3D istart; /* the current position */
     const BYTE* ip1; /* the next position */
+    /* Array of ~random data, should have low probability of matching data
+     * we load from here instead of from tables, if matchl0/matchl1 are
+     * invalid indices. Used to avoid unpredictable branches. */
+    const BYTE dummy[] =3D {0x12,0x34,0x56,0x78,0x9a,0xbc,0xde,0xf0,0xe2,0=
xb4};
=20
     DEBUGLOG(5, "ZSTD_compressBlock_doubleFast_noDict_generic");
=20
@@ -100,8 +160,8 @@ size_t ZSTD_compressBlock_doubleFast_noDict_generic(
         U32 const current =3D (U32)(ip - base);
         U32 const windowLow =3D ZSTD_getLowestPrefixIndex(ms, current, cPa=
rams->windowLog);
         U32 const maxRep =3D current - windowLow;
-        if (offset_2 > maxRep) offsetSaved =3D offset_2, offset_2 =3D 0;
-        if (offset_1 > maxRep) offsetSaved =3D offset_1, offset_1 =3D 0;
+        if (offset_2 > maxRep) offsetSaved2 =3D offset_2, offset_2 =3D 0;
+        if (offset_1 > maxRep) offsetSaved1 =3D offset_1, offset_1 =3D 0;
     }
=20
     /* Outer Loop: one iteration per match found and stored */
@@ -131,30 +191,35 @@ size_t ZSTD_compressBlock_doubleFast_noDict_generic(
             if ((offset_1 > 0) & (MEM_read32(ip+1-offset_1) =3D=3D MEM_rea=
d32(ip+1))) {
                 mLength =3D ZSTD_count(ip+1+4, ip+1+4-offset_1, iend) + 4;
                 ip++;
-                ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend,=
 STORE_REPCODE_1, mLength);
+                ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend,=
 REPCODE1_TO_OFFBASE, mLength);
                 goto _match_stored;
             }
=20
             hl1 =3D ZSTD_hashPtr(ip1, hBitsL, 8);
=20
-            if (idxl0 > prefixLowestIndex) {
+            /* idxl0 > prefixLowestIndex is a (somewhat) unpredictable bra=
nch.
+             * However expression below complies into conditional move. Si=
nce
+             * match is unlikely and we only *branch* on idxl0 > prefixLow=
estIndex
+             * if there is a match, all branches become predictable. */
+            {   const BYTE*  const matchl0_safe =3D ZSTD_selectAddr(idxl0,=
 prefixLowestIndex, matchl0, &dummy[0]);
+
                 /* check prefix long match */
-                if (MEM_read64(matchl0) =3D=3D MEM_read64(ip)) {
+                if (MEM_read64(matchl0_safe) =3D=3D MEM_read64(ip) && matc=
hl0_safe =3D=3D matchl0) {
                     mLength =3D ZSTD_count(ip+8, matchl0+8, iend) + 8;
                     offset =3D (U32)(ip-matchl0);
                     while (((ip>anchor) & (matchl0>prefixLowest)) && (ip[-=
1] =3D=3D matchl0[-1])) { ip--; matchl0--; mLength++; } /* catch up */
                     goto _match_found;
-                }
-            }
+            }   }
=20
             idxl1 =3D hashLong[hl1];
             matchl1 =3D base + idxl1;
=20
-            if (idxs0 > prefixLowestIndex) {
-                /* check prefix short match */
-                if (MEM_read32(matchs0) =3D=3D MEM_read32(ip)) {
-                    goto _search_next_long;
-                }
+            /* Same optimization as matchl0 above */
+            matchs0_safe =3D ZSTD_selectAddr(idxs0, prefixLowestIndex, mat=
chs0, &dummy[0]);
+
+            /* check prefix short match */
+            if(MEM_read32(matchs0_safe) =3D=3D MEM_read32(ip) && matchs0_s=
afe =3D=3D matchs0) {
+                  goto _search_next_long;
             }
=20
             if (ip1 >=3D nextStep) {
@@ -175,30 +240,36 @@ size_t ZSTD_compressBlock_doubleFast_noDict_generic(
         } while (ip1 <=3D ilimit);
=20
 _cleanup:
+        /* If offset_1 started invalid (offsetSaved1 !=3D 0) and became va=
lid (offset_1 !=3D 0),
+         * rotate saved offsets. See comment in ZSTD_compressBlock_fast_no=
Dict for more context. */
+        offsetSaved2 =3D ((offsetSaved1 !=3D 0) && (offset_1 !=3D 0)) ? of=
fsetSaved1 : offsetSaved2;
+
         /* save reps for next block */
-        rep[0] =3D offset_1 ? offset_1 : offsetSaved;
-        rep[1] =3D offset_2 ? offset_2 : offsetSaved;
+        rep[0] =3D offset_1 ? offset_1 : offsetSaved1;
+        rep[1] =3D offset_2 ? offset_2 : offsetSaved2;
=20
         /* Return the last literals size */
         return (size_t)(iend - anchor);
=20
 _search_next_long:
=20
-        /* check prefix long +1 match */
-        if (idxl1 > prefixLowestIndex) {
-            if (MEM_read64(matchl1) =3D=3D MEM_read64(ip1)) {
+        /* short match found: let's check for a longer one */
+        mLength =3D ZSTD_count(ip+4, matchs0+4, iend) + 4;
+        offset =3D (U32)(ip - matchs0);
+
+        /* check long match at +1 position */
+        if ((idxl1 > prefixLowestIndex) && (MEM_read64(matchl1) =3D=3D MEM=
_read64(ip1))) {
+            size_t const l1len =3D ZSTD_count(ip1+8, matchl1+8, iend) + 8;
+            if (l1len > mLength) {
+                /* use the long match instead */
                 ip =3D ip1;
-                mLength =3D ZSTD_count(ip+8, matchl1+8, iend) + 8;
+                mLength =3D l1len;
                 offset =3D (U32)(ip-matchl1);
-                while (((ip>anchor) & (matchl1>prefixLowest)) && (ip[-1] =
=3D=3D matchl1[-1])) { ip--; matchl1--; mLength++; } /* catch up */
-                goto _match_found;
+                matchs0 =3D matchl1;
             }
         }
=20
-        /* if no long +1 match, explore the short match we found */
-        mLength =3D ZSTD_count(ip+4, matchs0+4, iend) + 4;
-        offset =3D (U32)(ip - matchs0);
-        while (((ip>anchor) & (matchs0>prefixLowest)) && (ip[-1] =3D=3D ma=
tchs0[-1])) { ip--; matchs0--; mLength++; } /* catch up */
+        while (((ip>anchor) & (matchs0>prefixLowest)) && (ip[-1] =3D=3D ma=
tchs0[-1])) { ip--; matchs0--; mLength++; } /* complete backward */
=20
         /* fall-through */
=20
@@ -217,7 +288,7 @@ size_t ZSTD_compressBlock_doubleFast_noDict_generic(
             hashLong[hl1] =3D (U32)(ip1 - base);
         }
=20
-        ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend, STORE_O=
FFSET(offset), mLength);
+        ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend, OFFSET_=
TO_OFFBASE(offset), mLength);
=20
 _match_stored:
         /* match found */
@@ -243,7 +314,7 @@ size_t ZSTD_compressBlock_doubleFast_noDict_generic(
                 U32 const tmpOff =3D offset_2; offset_2 =3D offset_1; offs=
et_1 =3D tmpOff;  /* swap offset_2 <=3D> offset_1 */
                 hashSmall[ZSTD_hashPtr(ip, hBitsS, mls)] =3D (U32)(ip-base=
);
                 hashLong[ZSTD_hashPtr(ip, hBitsL, 8)] =3D (U32)(ip-base);
-                ZSTD_storeSeq(seqStore, 0, anchor, iend, STORE_REPCODE_1, =
rLength);
+                ZSTD_storeSeq(seqStore, 0, anchor, iend, REPCODE1_TO_OFFBA=
SE, rLength);
                 ip +=3D rLength;
                 anchor =3D ip;
                 continue;   /* faster when present ... (?) */
@@ -254,8 +325,9 @@ size_t ZSTD_compressBlock_doubleFast_noDict_generic(
=20
=20
 FORCE_INLINE_TEMPLATE
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
 size_t ZSTD_compressBlock_doubleFast_dictMatchState_generic(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize,
         U32 const mls /* template */)
 {
@@ -275,9 +347,8 @@ size_t ZSTD_compressBlock_doubleFast_dictMatchState_gen=
eric(
     const BYTE* const iend =3D istart + srcSize;
     const BYTE* const ilimit =3D iend - HASH_READ_SIZE;
     U32 offset_1=3Drep[0], offset_2=3Drep[1];
-    U32 offsetSaved =3D 0;
=20
-    const ZSTD_matchState_t* const dms =3D ms->dictMatchState;
+    const ZSTD_MatchState_t* const dms =3D ms->dictMatchState;
     const ZSTD_compressionParameters* const dictCParams =3D &dms->cParams;
     const U32* const dictHashLong  =3D dms->hashTable;
     const U32* const dictHashSmall =3D dms->chainTable;
@@ -286,8 +357,8 @@ size_t ZSTD_compressBlock_doubleFast_dictMatchState_gen=
eric(
     const BYTE* const dictStart    =3D dictBase + dictStartIndex;
     const BYTE* const dictEnd      =3D dms->window.nextSrc;
     const U32 dictIndexDelta       =3D prefixLowestIndex - (U32)(dictEnd -=
 dictBase);
-    const U32 dictHBitsL           =3D dictCParams->hashLog;
-    const U32 dictHBitsS           =3D dictCParams->chainLog;
+    const U32 dictHBitsL           =3D dictCParams->hashLog + ZSTD_SHORT_C=
ACHE_TAG_BITS;
+    const U32 dictHBitsS           =3D dictCParams->chainLog + ZSTD_SHORT_=
CACHE_TAG_BITS;
     const U32 dictAndPrefixLength  =3D (U32)((ip - prefixLowest) + (dictEn=
d - dictStart));
=20
     DEBUGLOG(5, "ZSTD_compressBlock_doubleFast_dictMatchState_generic");
@@ -295,6 +366,13 @@ size_t ZSTD_compressBlock_doubleFast_dictMatchState_ge=
neric(
     /* if a dictionary is attached, it must be within window range */
     assert(ms->window.dictLimit + (1U << cParams->windowLog) >=3D endIndex=
);
=20
+    if (ms->prefetchCDictTables) {
+        size_t const hashTableBytes =3D (((size_t)1) << dictCParams->hashL=
og) * sizeof(U32);
+        size_t const chainTableBytes =3D (((size_t)1) << dictCParams->chai=
nLog) * sizeof(U32);
+        PREFETCH_AREA(dictHashLong, hashTableBytes);
+        PREFETCH_AREA(dictHashSmall, chainTableBytes);
+    }
+
     /* init */
     ip +=3D (dictAndPrefixLength =3D=3D 0);
=20
@@ -309,8 +387,12 @@ size_t ZSTD_compressBlock_doubleFast_dictMatchState_ge=
neric(
         U32 offset;
         size_t const h2 =3D ZSTD_hashPtr(ip, hBitsL, 8);
         size_t const h =3D ZSTD_hashPtr(ip, hBitsS, mls);
-        size_t const dictHL =3D ZSTD_hashPtr(ip, dictHBitsL, 8);
-        size_t const dictHS =3D ZSTD_hashPtr(ip, dictHBitsS, mls);
+        size_t const dictHashAndTagL =3D ZSTD_hashPtr(ip, dictHBitsL, 8);
+        size_t const dictHashAndTagS =3D ZSTD_hashPtr(ip, dictHBitsS, mls);
+        U32 const dictMatchIndexAndTagL =3D dictHashLong[dictHashAndTagL >=
> ZSTD_SHORT_CACHE_TAG_BITS];
+        U32 const dictMatchIndexAndTagS =3D dictHashSmall[dictHashAndTagS =
>> ZSTD_SHORT_CACHE_TAG_BITS];
+        int const dictTagsMatchL =3D ZSTD_comparePackedTags(dictMatchIndex=
AndTagL, dictHashAndTagL);
+        int const dictTagsMatchS =3D ZSTD_comparePackedTags(dictMatchIndex=
AndTagS, dictHashAndTagS);
         U32 const curr =3D (U32)(ip-base);
         U32 const matchIndexL =3D hashLong[h2];
         U32 matchIndexS =3D hashSmall[h];
@@ -323,26 +405,24 @@ size_t ZSTD_compressBlock_doubleFast_dictMatchState_g=
eneric(
         hashLong[h2] =3D hashSmall[h] =3D curr;   /* update hash tables */
=20
         /* check repcode */
-        if (((U32)((prefixLowestIndex-1) - repIndex) >=3D 3 /* intentional=
 underflow */)
+        if ((ZSTD_index_overlap_check(prefixLowestIndex, repIndex))
             && (MEM_read32(repMatch) =3D=3D MEM_read32(ip+1)) ) {
             const BYTE* repMatchEnd =3D repIndex < prefixLowestIndex ? dic=
tEnd : iend;
             mLength =3D ZSTD_count_2segments(ip+1+4, repMatch+4, iend, rep=
MatchEnd, prefixLowest) + 4;
             ip++;
-            ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend, STO=
RE_REPCODE_1, mLength);
+            ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend, REP=
CODE1_TO_OFFBASE, mLength);
             goto _match_stored;
         }
=20
-        if (matchIndexL > prefixLowestIndex) {
+        if ((matchIndexL >=3D prefixLowestIndex) && (MEM_read64(matchLong)=
 =3D=3D MEM_read64(ip))) {
             /* check prefix long match */
-            if (MEM_read64(matchLong) =3D=3D MEM_read64(ip)) {
-                mLength =3D ZSTD_count(ip+8, matchLong+8, iend) + 8;
-                offset =3D (U32)(ip-matchLong);
-                while (((ip>anchor) & (matchLong>prefixLowest)) && (ip[-1]=
 =3D=3D matchLong[-1])) { ip--; matchLong--; mLength++; } /* catch up */
-                goto _match_found;
-            }
-        } else {
+            mLength =3D ZSTD_count(ip+8, matchLong+8, iend) + 8;
+            offset =3D (U32)(ip-matchLong);
+            while (((ip>anchor) & (matchLong>prefixLowest)) && (ip[-1] =3D=
=3D matchLong[-1])) { ip--; matchLong--; mLength++; } /* catch up */
+            goto _match_found;
+        } else if (dictTagsMatchL) {
             /* check dictMatchState long match */
-            U32 const dictMatchIndexL =3D dictHashLong[dictHL];
+            U32 const dictMatchIndexL =3D dictMatchIndexAndTagL >> ZSTD_SH=
ORT_CACHE_TAG_BITS;
             const BYTE* dictMatchL =3D dictBase + dictMatchIndexL;
             assert(dictMatchL < dictEnd);
=20
@@ -354,13 +434,13 @@ size_t ZSTD_compressBlock_doubleFast_dictMatchState_g=
eneric(
         }   }
=20
         if (matchIndexS > prefixLowestIndex) {
-            /* check prefix short match */
+            /* short match  candidate */
             if (MEM_read32(match) =3D=3D MEM_read32(ip)) {
                 goto _search_next_long;
             }
-        } else {
+        } else if (dictTagsMatchS) {
             /* check dictMatchState short match */
-            U32 const dictMatchIndexS =3D dictHashSmall[dictHS];
+            U32 const dictMatchIndexS =3D dictMatchIndexAndTagS >> ZSTD_SH=
ORT_CACHE_TAG_BITS;
             match =3D dictBase + dictMatchIndexS;
             matchIndexS =3D dictMatchIndexS + dictIndexDelta;
=20
@@ -375,25 +455,24 @@ size_t ZSTD_compressBlock_doubleFast_dictMatchState_g=
eneric(
         continue;
=20
 _search_next_long:
-
         {   size_t const hl3 =3D ZSTD_hashPtr(ip+1, hBitsL, 8);
-            size_t const dictHLNext =3D ZSTD_hashPtr(ip+1, dictHBitsL, 8);
+            size_t const dictHashAndTagL3 =3D ZSTD_hashPtr(ip+1, dictHBits=
L, 8);
             U32 const matchIndexL3 =3D hashLong[hl3];
+            U32 const dictMatchIndexAndTagL3 =3D dictHashLong[dictHashAndT=
agL3 >> ZSTD_SHORT_CACHE_TAG_BITS];
+            int const dictTagsMatchL3 =3D ZSTD_comparePackedTags(dictMatch=
IndexAndTagL3, dictHashAndTagL3);
             const BYTE* matchL3 =3D base + matchIndexL3;
             hashLong[hl3] =3D curr + 1;
=20
             /* check prefix long +1 match */
-            if (matchIndexL3 > prefixLowestIndex) {
-                if (MEM_read64(matchL3) =3D=3D MEM_read64(ip+1)) {
-                    mLength =3D ZSTD_count(ip+9, matchL3+8, iend) + 8;
-                    ip++;
-                    offset =3D (U32)(ip-matchL3);
-                    while (((ip>anchor) & (matchL3>prefixLowest)) && (ip[-=
1] =3D=3D matchL3[-1])) { ip--; matchL3--; mLength++; } /* catch up */
-                    goto _match_found;
-                }
-            } else {
+            if ((matchIndexL3 >=3D prefixLowestIndex) && (MEM_read64(match=
L3) =3D=3D MEM_read64(ip+1))) {
+                mLength =3D ZSTD_count(ip+9, matchL3+8, iend) + 8;
+                ip++;
+                offset =3D (U32)(ip-matchL3);
+                while (((ip>anchor) & (matchL3>prefixLowest)) && (ip[-1] =
=3D=3D matchL3[-1])) { ip--; matchL3--; mLength++; } /* catch up */
+                goto _match_found;
+            } else if (dictTagsMatchL3) {
                 /* check dict long +1 match */
-                U32 const dictMatchIndexL3 =3D dictHashLong[dictHLNext];
+                U32 const dictMatchIndexL3 =3D dictMatchIndexAndTagL3 >> Z=
STD_SHORT_CACHE_TAG_BITS;
                 const BYTE* dictMatchL3 =3D dictBase + dictMatchIndexL3;
                 assert(dictMatchL3 < dictEnd);
                 if (dictMatchL3 > dictStart && MEM_read64(dictMatchL3) =3D=
=3D MEM_read64(ip+1)) {
@@ -419,7 +498,7 @@ size_t ZSTD_compressBlock_doubleFast_dictMatchState_gen=
eric(
         offset_2 =3D offset_1;
         offset_1 =3D offset;
=20
-        ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend, STORE_O=
FFSET(offset), mLength);
+        ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend, OFFSET_=
TO_OFFBASE(offset), mLength);
=20
 _match_stored:
         /* match found */
@@ -443,12 +522,12 @@ size_t ZSTD_compressBlock_doubleFast_dictMatchState_g=
eneric(
                 const BYTE* repMatch2 =3D repIndex2 < prefixLowestIndex ?
                         dictBase + repIndex2 - dictIndexDelta :
                         base + repIndex2;
-                if ( ((U32)((prefixLowestIndex-1) - (U32)repIndex2) >=3D 3=
 /* intentional overflow */)
+                if ( (ZSTD_index_overlap_check(prefixLowestIndex, repIndex=
2))
                    && (MEM_read32(repMatch2) =3D=3D MEM_read32(ip)) ) {
                     const BYTE* const repEnd2 =3D repIndex2 < prefixLowest=
Index ? dictEnd : iend;
                     size_t const repLength2 =3D ZSTD_count_2segments(ip+4,=
 repMatch2+4, iend, repEnd2, prefixLowest) + 4;
                     U32 tmpOffset =3D offset_2; offset_2 =3D offset_1; off=
set_1 =3D tmpOffset;   /* swap offset_2 <=3D> offset_1 */
-                    ZSTD_storeSeq(seqStore, 0, anchor, iend, STORE_REPCODE=
_1, repLength2);
+                    ZSTD_storeSeq(seqStore, 0, anchor, iend, REPCODE1_TO_O=
FFBASE, repLength2);
                     hashSmall[ZSTD_hashPtr(ip, hBitsS, mls)] =3D current2;
                     hashLong[ZSTD_hashPtr(ip, hBitsL, 8)] =3D current2;
                     ip +=3D repLength2;
@@ -461,8 +540,8 @@ size_t ZSTD_compressBlock_doubleFast_dictMatchState_gen=
eric(
     }   /* while (ip < ilimit) */
=20
     /* save reps for next block */
-    rep[0] =3D offset_1 ? offset_1 : offsetSaved;
-    rep[1] =3D offset_2 ? offset_2 : offsetSaved;
+    rep[0] =3D offset_1;
+    rep[1] =3D offset_2;
=20
     /* Return the last literals size */
     return (size_t)(iend - anchor);
@@ -470,7 +549,7 @@ size_t ZSTD_compressBlock_doubleFast_dictMatchState_gen=
eric(
=20
 #define ZSTD_GEN_DFAST_FN(dictMode, mls)                                  =
                               \
     static size_t ZSTD_compressBlock_doubleFast_##dictMode##_##mls(       =
                               \
-            ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_=
NUM],                          \
+            ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_=
NUM],                          \
             void const* src, size_t srcSize)                              =
                               \
     {                                                                     =
                               \
         return ZSTD_compressBlock_doubleFast_##dictMode##_generic(ms, seqS=
tore, rep, src, srcSize, mls); \
@@ -488,7 +567,7 @@ ZSTD_GEN_DFAST_FN(dictMatchState, 7)
=20
=20
 size_t ZSTD_compressBlock_doubleFast(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
 {
     const U32 mls =3D ms->cParams.minMatch;
@@ -508,7 +587,7 @@ size_t ZSTD_compressBlock_doubleFast(
=20
=20
 size_t ZSTD_compressBlock_doubleFast_dictMatchState(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
 {
     const U32 mls =3D ms->cParams.minMatch;
@@ -527,8 +606,10 @@ size_t ZSTD_compressBlock_doubleFast_dictMatchState(
 }
=20
=20
-static size_t ZSTD_compressBlock_doubleFast_extDict_generic(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+static
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
+size_t ZSTD_compressBlock_doubleFast_extDict_generic(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize,
         U32 const mls /* template */)
 {
@@ -579,13 +660,13 @@ static size_t ZSTD_compressBlock_doubleFast_extDict_g=
eneric(
         size_t mLength;
         hashSmall[hSmall] =3D hashLong[hLong] =3D curr;   /* update hash t=
able */
=20
-        if ((((U32)((prefixStartIndex-1) - repIndex) >=3D 3) /* intentiona=
l underflow : ensure repIndex doesn't overlap dict + prefix */
+        if (((ZSTD_index_overlap_check(prefixStartIndex, repIndex))
             & (offset_1 <=3D curr+1 - dictStartIndex)) /* note: we are sea=
rching at curr+1 */
           && (MEM_read32(repMatch) =3D=3D MEM_read32(ip+1)) ) {
             const BYTE* repMatchEnd =3D repIndex < prefixStartIndex ? dict=
End : iend;
             mLength =3D ZSTD_count_2segments(ip+1+4, repMatch+4, iend, rep=
MatchEnd, prefixStart) + 4;
             ip++;
-            ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend, STO=
RE_REPCODE_1, mLength);
+            ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend, REP=
CODE1_TO_OFFBASE, mLength);
         } else {
             if ((matchLongIndex > dictStartIndex) && (MEM_read64(matchLong=
) =3D=3D MEM_read64(ip))) {
                 const BYTE* const matchEnd =3D matchLongIndex < prefixStar=
tIndex ? dictEnd : iend;
@@ -596,7 +677,7 @@ static size_t ZSTD_compressBlock_doubleFast_extDict_gen=
eric(
                 while (((ip>anchor) & (matchLong>lowMatchPtr)) && (ip[-1] =
=3D=3D matchLong[-1])) { ip--; matchLong--; mLength++; }   /* catch up */
                 offset_2 =3D offset_1;
                 offset_1 =3D offset;
-                ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend,=
 STORE_OFFSET(offset), mLength);
+                ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend,=
 OFFSET_TO_OFFBASE(offset), mLength);
=20
             } else if ((matchIndex > dictStartIndex) && (MEM_read32(match)=
 =3D=3D MEM_read32(ip))) {
                 size_t const h3 =3D ZSTD_hashPtr(ip+1, hBitsL, 8);
@@ -621,7 +702,7 @@ static size_t ZSTD_compressBlock_doubleFast_extDict_gen=
eric(
                 }
                 offset_2 =3D offset_1;
                 offset_1 =3D offset;
-                ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend,=
 STORE_OFFSET(offset), mLength);
+                ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend,=
 OFFSET_TO_OFFBASE(offset), mLength);
=20
             } else {
                 ip +=3D ((ip-anchor) >> kSearchStrength) + 1;
@@ -647,13 +728,13 @@ static size_t ZSTD_compressBlock_doubleFast_extDict_g=
eneric(
                 U32 const current2 =3D (U32)(ip-base);
                 U32 const repIndex2 =3D current2 - offset_2;
                 const BYTE* repMatch2 =3D repIndex2 < prefixStartIndex ? d=
ictBase + repIndex2 : base + repIndex2;
-                if ( (((U32)((prefixStartIndex-1) - repIndex2) >=3D 3)   /=
* intentional overflow : ensure repIndex2 doesn't overlap dict + prefix */
+                if ( ((ZSTD_index_overlap_check(prefixStartIndex, repIndex=
2))
                     & (offset_2 <=3D current2 - dictStartIndex))
                   && (MEM_read32(repMatch2) =3D=3D MEM_read32(ip)) ) {
                     const BYTE* const repEnd2 =3D repIndex2 < prefixStartI=
ndex ? dictEnd : iend;
                     size_t const repLength2 =3D ZSTD_count_2segments(ip+4,=
 repMatch2+4, iend, repEnd2, prefixStart) + 4;
                     U32 const tmpOffset =3D offset_2; offset_2 =3D offset_=
1; offset_1 =3D tmpOffset;   /* swap offset_2 <=3D> offset_1 */
-                    ZSTD_storeSeq(seqStore, 0, anchor, iend, STORE_REPCODE=
_1, repLength2);
+                    ZSTD_storeSeq(seqStore, 0, anchor, iend, REPCODE1_TO_O=
FFBASE, repLength2);
                     hashSmall[ZSTD_hashPtr(ip, hBitsS, mls)] =3D current2;
                     hashLong[ZSTD_hashPtr(ip, hBitsL, 8)] =3D current2;
                     ip +=3D repLength2;
@@ -677,7 +758,7 @@ ZSTD_GEN_DFAST_FN(extDict, 6)
 ZSTD_GEN_DFAST_FN(extDict, 7)
=20
 size_t ZSTD_compressBlock_doubleFast_extDict(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
 {
     U32 const mls =3D ms->cParams.minMatch;
@@ -694,3 +775,5 @@ size_t ZSTD_compressBlock_doubleFast_extDict(
         return ZSTD_compressBlock_doubleFast_extDict_7(ms, seqStore, rep, =
src, srcSize);
     }
 }
+
+#endif /* ZSTD_EXCLUDE_DFAST_BLOCK_COMPRESSOR */
diff --git a/lib/zstd/compress/zstd_double_fast.h b/lib/zstd/compress/zstd_=
double_fast.h
index 6822bde65a1d..011556ce56f7 100644
--- a/lib/zstd/compress/zstd_double_fast.h
+++ b/lib/zstd/compress/zstd_double_fast.h
@@ -1,5 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -11,22 +12,32 @@
 #ifndef ZSTD_DOUBLE_FAST_H
 #define ZSTD_DOUBLE_FAST_H
=20
-
 #include "../common/mem.h"      /* U32 */
 #include "zstd_compress_internal.h"     /* ZSTD_CCtx, size_t */
=20
-void ZSTD_fillDoubleHashTable(ZSTD_matchState_t* ms,
-                              void const* end, ZSTD_dictTableLoadMethod_e =
dtlm);
+#ifndef ZSTD_EXCLUDE_DFAST_BLOCK_COMPRESSOR
+
+void ZSTD_fillDoubleHashTable(ZSTD_MatchState_t* ms,
+                              void const* end, ZSTD_dictTableLoadMethod_e =
dtlm,
+                              ZSTD_tableFillPurpose_e tfp);
+
 size_t ZSTD_compressBlock_doubleFast(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
 size_t ZSTD_compressBlock_doubleFast_dictMatchState(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
 size_t ZSTD_compressBlock_doubleFast_extDict(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
=20
-
+#define ZSTD_COMPRESSBLOCK_DOUBLEFAST ZSTD_compressBlock_doubleFast
+#define ZSTD_COMPRESSBLOCK_DOUBLEFAST_DICTMATCHSTATE ZSTD_compressBlock_do=
ubleFast_dictMatchState
+#define ZSTD_COMPRESSBLOCK_DOUBLEFAST_EXTDICT ZSTD_compressBlock_doubleFas=
t_extDict
+#else
+#define ZSTD_COMPRESSBLOCK_DOUBLEFAST NULL
+#define ZSTD_COMPRESSBLOCK_DOUBLEFAST_DICTMATCHSTATE NULL
+#define ZSTD_COMPRESSBLOCK_DOUBLEFAST_EXTDICT NULL
+#endif /* ZSTD_EXCLUDE_DFAST_BLOCK_COMPRESSOR */
=20
 #endif /* ZSTD_DOUBLE_FAST_H */
diff --git a/lib/zstd/compress/zstd_fast.c b/lib/zstd/compress/zstd_fast.c
index a752e6beab52..60e07e839e5f 100644
--- a/lib/zstd/compress/zstd_fast.c
+++ b/lib/zstd/compress/zstd_fast.c
@@ -1,5 +1,6 @@
+// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -11,8 +12,46 @@
 #include "zstd_compress_internal.h"  /* ZSTD_hashPtr, ZSTD_count, ZSTD_sto=
reSeq */
 #include "zstd_fast.h"
=20
+static
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
+void ZSTD_fillHashTableForCDict(ZSTD_MatchState_t* ms,
+                        const void* const end,
+                        ZSTD_dictTableLoadMethod_e dtlm)
+{
+    const ZSTD_compressionParameters* const cParams =3D &ms->cParams;
+    U32* const hashTable =3D ms->hashTable;
+    U32  const hBits =3D cParams->hashLog + ZSTD_SHORT_CACHE_TAG_BITS;
+    U32  const mls =3D cParams->minMatch;
+    const BYTE* const base =3D ms->window.base;
+    const BYTE* ip =3D base + ms->nextToUpdate;
+    const BYTE* const iend =3D ((const BYTE*)end) - HASH_READ_SIZE;
+    const U32 fastHashFillStep =3D 3;
+
+    /* Currently, we always use ZSTD_dtlm_full for filling CDict tables.
+     * Feel free to remove this assert if there's a good reason! */
+    assert(dtlm =3D=3D ZSTD_dtlm_full);
+
+    /* Always insert every fastHashFillStep position into the hash table.
+     * Insert the other positions if their hash entry is empty.
+     */
+    for ( ; ip + fastHashFillStep < iend + 2; ip +=3D fastHashFillStep) {
+        U32 const curr =3D (U32)(ip - base);
+        {   size_t const hashAndTag =3D ZSTD_hashPtr(ip, hBits, mls);
+            ZSTD_writeTaggedIndex(hashTable, hashAndTag, curr);   }
+
+        if (dtlm =3D=3D ZSTD_dtlm_fast) continue;
+        /* Only load extra positions for ZSTD_dtlm_full */
+        {   U32 p;
+            for (p =3D 1; p < fastHashFillStep; ++p) {
+                size_t const hashAndTag =3D ZSTD_hashPtr(ip + p, hBits, ml=
s);
+                if (hashTable[hashAndTag >> ZSTD_SHORT_CACHE_TAG_BITS] =3D=
=3D 0) {  /* not yet filled */
+                    ZSTD_writeTaggedIndex(hashTable, hashAndTag, curr + p);
+    }   }   }   }
+}
=20
-void ZSTD_fillHashTable(ZSTD_matchState_t* ms,
+static
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
+void ZSTD_fillHashTableForCCtx(ZSTD_MatchState_t* ms,
                         const void* const end,
                         ZSTD_dictTableLoadMethod_e dtlm)
 {
@@ -25,6 +64,10 @@ void ZSTD_fillHashTable(ZSTD_matchState_t* ms,
     const BYTE* const iend =3D ((const BYTE*)end) - HASH_READ_SIZE;
     const U32 fastHashFillStep =3D 3;
=20
+    /* Currently, we always use ZSTD_dtlm_fast for filling CCtx tables.
+     * Feel free to remove this assert if there's a good reason! */
+    assert(dtlm =3D=3D ZSTD_dtlm_fast);
+
     /* Always insert every fastHashFillStep position into the hash table.
      * Insert the other positions if their hash entry is empty.
      */
@@ -42,6 +85,60 @@ void ZSTD_fillHashTable(ZSTD_matchState_t* ms,
     }   }   }   }
 }
=20
+void ZSTD_fillHashTable(ZSTD_MatchState_t* ms,
+                        const void* const end,
+                        ZSTD_dictTableLoadMethod_e dtlm,
+                        ZSTD_tableFillPurpose_e tfp)
+{
+    if (tfp =3D=3D ZSTD_tfp_forCDict) {
+        ZSTD_fillHashTableForCDict(ms, end, dtlm);
+    } else {
+        ZSTD_fillHashTableForCCtx(ms, end, dtlm);
+    }
+}
+
+
+typedef int (*ZSTD_match4Found) (const BYTE* currentPtr, const BYTE* match=
Address, U32 matchIdx, U32 idxLowLimit);
+
+static int
+ZSTD_match4Found_cmov(const BYTE* currentPtr, const BYTE* matchAddress, U3=
2 matchIdx, U32 idxLowLimit)
+{
+    /* Array of ~random data, should have low probability of matching data.
+     * Load from here if the index is invalid.
+     * Used to avoid unpredictable branches. */
+    static const BYTE dummy[] =3D {0x12,0x34,0x56,0x78};
+
+    /* currentIdx >=3D lowLimit is a (somewhat) unpredictable branch.
+     * However expression below compiles into conditional move.
+     */
+    const BYTE* mvalAddr =3D ZSTD_selectAddr(matchIdx, idxLowLimit, matchA=
ddress, dummy);
+    /* Note: this used to be written as : return test1 && test2;
+     * Unfortunately, once inlined, these tests become branches,
+     * in which case it becomes critical that they are executed in the rig=
ht order (test1 then test2).
+     * So we have to write these tests in a specific manner to ensure thei=
r ordering.
+     */
+    if (MEM_read32(currentPtr) !=3D MEM_read32(mvalAddr)) return 0;
+    /* force ordering of these tests, which matters once the function is i=
nlined, as they become branches */
+    __asm__("");
+    return matchIdx >=3D idxLowLimit;
+}
+
+static int
+ZSTD_match4Found_branch(const BYTE* currentPtr, const BYTE* matchAddress, =
U32 matchIdx, U32 idxLowLimit)
+{
+    /* using a branch instead of a cmov,
+     * because it's faster in scenarios where matchIdx >=3D idxLowLimit is=
 generally true,
+     * aka almost all candidates are within range */
+    U32 mval;
+    if (matchIdx >=3D idxLowLimit) {
+        mval =3D MEM_read32(matchAddress);
+    } else {
+        mval =3D MEM_read32(currentPtr) ^ 1; /* guaranteed to not match. */
+    }
+
+    return (MEM_read32(currentPtr) =3D=3D mval);
+}
+
=20
 /*
  * If you squint hard enough (and ignore repcodes), the search operation a=
t any
@@ -89,17 +186,17 @@ void ZSTD_fillHashTable(ZSTD_matchState_t* ms,
  *
  * This is also the work we do at the beginning to enter the loop initiall=
y.
  */
-FORCE_INLINE_TEMPLATE size_t
-ZSTD_compressBlock_fast_noDict_generic(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+FORCE_INLINE_TEMPLATE
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
+size_t ZSTD_compressBlock_fast_noDict_generic(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize,
-        U32 const mls, U32 const hasStep)
+        U32 const mls, int useCmov)
 {
     const ZSTD_compressionParameters* const cParams =3D &ms->cParams;
     U32* const hashTable =3D ms->hashTable;
     U32 const hlog =3D cParams->hashLog;
-    /* support stepSize of 0 */
-    size_t const stepSize =3D hasStep ? (cParams->targetLength + !(cParams=
->targetLength) + 1) : 2;
+    size_t const stepSize =3D cParams->targetLength + !(cParams->targetLen=
gth) + 1; /* min 2 */
     const BYTE* const base =3D ms->window.base;
     const BYTE* const istart =3D (const BYTE*)src;
     const U32   endIndex =3D (U32)((size_t)(istart - base) + srcSize);
@@ -117,12 +214,11 @@ ZSTD_compressBlock_fast_noDict_generic(
=20
     U32 rep_offset1 =3D rep[0];
     U32 rep_offset2 =3D rep[1];
-    U32 offsetSaved =3D 0;
+    U32 offsetSaved1 =3D 0, offsetSaved2 =3D 0;
=20
     size_t hash0; /* hash for ip0 */
     size_t hash1; /* hash for ip1 */
-    U32 idx; /* match idx for ip0 */
-    U32 mval; /* src value at match idx */
+    U32 matchIdx; /* match idx for ip0 */
=20
     U32 offcode;
     const BYTE* match0;
@@ -135,14 +231,15 @@ ZSTD_compressBlock_fast_noDict_generic(
     size_t step;
     const BYTE* nextStep;
     const size_t kStepIncr =3D (1 << (kSearchStrength - 1));
+    const ZSTD_match4Found matchFound =3D useCmov ? ZSTD_match4Found_cmov =
: ZSTD_match4Found_branch;
=20
     DEBUGLOG(5, "ZSTD_compressBlock_fast_generic");
     ip0 +=3D (ip0 =3D=3D prefixStart);
     {   U32 const curr =3D (U32)(ip0 - base);
         U32 const windowLow =3D ZSTD_getLowestPrefixIndex(ms, curr, cParam=
s->windowLog);
         U32 const maxRep =3D curr - windowLow;
-        if (rep_offset2 > maxRep) offsetSaved =3D rep_offset2, rep_offset2=
 =3D 0;
-        if (rep_offset1 > maxRep) offsetSaved =3D rep_offset1, rep_offset1=
 =3D 0;
+        if (rep_offset2 > maxRep) offsetSaved2 =3D rep_offset2, rep_offset=
2 =3D 0;
+        if (rep_offset1 > maxRep) offsetSaved1 =3D rep_offset1, rep_offset=
1 =3D 0;
     }
=20
     /* start each op */
@@ -163,7 +260,7 @@ ZSTD_compressBlock_fast_noDict_generic(
     hash0 =3D ZSTD_hashPtr(ip0, hlog, mls);
     hash1 =3D ZSTD_hashPtr(ip1, hlog, mls);
=20
-    idx =3D hashTable[hash0];
+    matchIdx =3D hashTable[hash0];
=20
     do {
         /* load repcode match for ip[2]*/
@@ -180,26 +277,28 @@ ZSTD_compressBlock_fast_noDict_generic(
             mLength =3D ip0[-1] =3D=3D match0[-1];
             ip0 -=3D mLength;
             match0 -=3D mLength;
-            offcode =3D STORE_REPCODE_1;
+            offcode =3D REPCODE1_TO_OFFBASE;
             mLength +=3D 4;
+
+            /* Write next hash table entry: it's already calculated.
+             * This write is known to be safe because ip1 is before the
+             * repcode (ip2). */
+            hashTable[hash1] =3D (U32)(ip1 - base);
+
             goto _match;
         }
=20
-        /* load match for ip[0] */
-        if (idx >=3D prefixStartIndex) {
-            mval =3D MEM_read32(base + idx);
-        } else {
-            mval =3D MEM_read32(ip0) ^ 1; /* guaranteed to not match. */
-        }
+         if (matchFound(ip0, base + matchIdx, matchIdx, prefixStartIndex))=
 {
+            /* Write next hash table entry (it's already calculated).
+            * This write is known to be safe because the ip1 =3D=3D ip0 + =
1,
+            * so searching will resume after ip1 */
+            hashTable[hash1] =3D (U32)(ip1 - base);
=20
-        /* check match at ip[0] */
-        if (MEM_read32(ip0) =3D=3D mval) {
-            /* found a match! */
             goto _offset;
         }
=20
         /* lookup ip[1] */
-        idx =3D hashTable[hash1];
+        matchIdx =3D hashTable[hash1];
=20
         /* hash ip[2] */
         hash0 =3D hash1;
@@ -214,21 +313,19 @@ ZSTD_compressBlock_fast_noDict_generic(
         current0 =3D (U32)(ip0 - base);
         hashTable[hash0] =3D current0;
=20
-        /* load match for ip[0] */
-        if (idx >=3D prefixStartIndex) {
-            mval =3D MEM_read32(base + idx);
-        } else {
-            mval =3D MEM_read32(ip0) ^ 1; /* guaranteed to not match. */
-        }
-
-        /* check match at ip[0] */
-        if (MEM_read32(ip0) =3D=3D mval) {
-            /* found a match! */
+         if (matchFound(ip0, base + matchIdx, matchIdx, prefixStartIndex))=
 {
+            /* Write next hash table entry, since it's already calculated =
*/
+            if (step <=3D 4) {
+                /* Avoid writing an index if it's >=3D position where sear=
ch will resume.
+                * The minimum possible match has length 4, so search can r=
esume at ip0 + 4.
+                */
+                hashTable[hash1] =3D (U32)(ip1 - base);
+            }
             goto _offset;
         }
=20
         /* lookup ip[1] */
-        idx =3D hashTable[hash1];
+        matchIdx =3D hashTable[hash1];
=20
         /* hash ip[2] */
         hash0 =3D hash1;
@@ -250,13 +347,28 @@ ZSTD_compressBlock_fast_noDict_generic(
     } while (ip3 < ilimit);
=20
 _cleanup:
-    /* Note that there are probably still a couple positions we could sear=
ch.
+    /* Note that there are probably still a couple positions one could sea=
rch.
      * However, it seems to be a meaningful performance hit to try to sear=
ch
      * them. So let's not. */
=20
+    /* When the repcodes are outside of the prefix, we set them to zero be=
fore the loop.
+     * When the offsets are still zero, we need to restore them after the =
block to have a correct
+     * repcode history. If only one offset was invalid, it is easy. The tr=
icky case is when both
+     * offsets were invalid. We need to figure out which offset to refill =
with.
+     *     - If both offsets are zero they are in the same order.
+     *     - If both offsets are non-zero, we won't restore the offsets fr=
om `offsetSaved[12]`.
+     *     - If only one is zero, we need to decide which offset to restor=
e.
+     *         - If rep_offset1 is non-zero, then rep_offset2 must be offs=
etSaved1.
+     *         - It is impossible for rep_offset2 to be non-zero.
+     *
+     * So if rep_offset1 started invalid (offsetSaved1 !=3D 0) and became =
valid (rep_offset1 !=3D 0), then
+     * set rep[0] =3D rep_offset1 and rep[1] =3D offsetSaved1.
+     */
+    offsetSaved2 =3D ((offsetSaved1 !=3D 0) && (rep_offset1 !=3D 0)) ? off=
setSaved1 : offsetSaved2;
+
     /* save reps for next block */
-    rep[0] =3D rep_offset1 ? rep_offset1 : offsetSaved;
-    rep[1] =3D rep_offset2 ? rep_offset2 : offsetSaved;
+    rep[0] =3D rep_offset1 ? rep_offset1 : offsetSaved1;
+    rep[1] =3D rep_offset2 ? rep_offset2 : offsetSaved2;
=20
     /* Return the last literals size */
     return (size_t)(iend - anchor);
@@ -264,10 +376,10 @@ ZSTD_compressBlock_fast_noDict_generic(
 _offset: /* Requires: ip0, idx */
=20
     /* Compute the offset code. */
-    match0 =3D base + idx;
+    match0 =3D base + matchIdx;
     rep_offset2 =3D rep_offset1;
     rep_offset1 =3D (U32)(ip0-match0);
-    offcode =3D STORE_OFFSET(rep_offset1);
+    offcode =3D OFFSET_TO_OFFBASE(rep_offset1);
     mLength =3D 4;
=20
     /* Count the backwards match length. */
@@ -287,11 +399,6 @@ ZSTD_compressBlock_fast_noDict_generic(
     ip0 +=3D mLength;
     anchor =3D ip0;
=20
-    /* write next hash table entry */
-    if (ip1 < ip0) {
-        hashTable[hash1] =3D (U32)(ip1 - base);
-    }
-
     /* Fill table and check for immediate repcode. */
     if (ip0 <=3D ilimit) {
         /* Fill Table */
@@ -306,7 +413,7 @@ ZSTD_compressBlock_fast_noDict_generic(
                 { U32 const tmpOff =3D rep_offset2; rep_offset2 =3D rep_of=
fset1; rep_offset1 =3D tmpOff; } /* swap rep_offset2 <=3D> rep_offset1 */
                 hashTable[ZSTD_hashPtr(ip0, hlog, mls)] =3D (U32)(ip0-base=
);
                 ip0 +=3D rLength;
-                ZSTD_storeSeq(seqStore, 0 /*litLen*/, anchor, iend, STORE_=
REPCODE_1, rLength);
+                ZSTD_storeSeq(seqStore, 0 /*litLen*/, anchor, iend, REPCOD=
E1_TO_OFFBASE, rLength);
                 anchor =3D ip0;
                 continue;   /* faster when present (confirmed on gcc-8) ..=
. (?) */
     }   }   }
@@ -314,12 +421,12 @@ ZSTD_compressBlock_fast_noDict_generic(
     goto _start;
 }
=20
-#define ZSTD_GEN_FAST_FN(dictMode, mls, step)                             =
                               \
-    static size_t ZSTD_compressBlock_fast_##dictMode##_##mls##_##step(    =
                                  \
-            ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_=
NUM],                    \
+#define ZSTD_GEN_FAST_FN(dictMode, mml, cmov)                             =
                          \
+    static size_t ZSTD_compressBlock_fast_##dictMode##_##mml##_##cmov(    =
                          \
+            ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_=
NUM],                    \
             void const* src, size_t srcSize)                              =
                         \
     {                                                                     =
                         \
-        return ZSTD_compressBlock_fast_##dictMode##_generic(ms, seqStore, =
rep, src, srcSize, mls, step); \
+        return ZSTD_compressBlock_fast_##dictMode##_generic(ms, seqStore, =
rep, src, srcSize, mml, cmov); \
     }
=20
 ZSTD_GEN_FAST_FN(noDict, 4, 1)
@@ -333,13 +440,15 @@ ZSTD_GEN_FAST_FN(noDict, 6, 0)
 ZSTD_GEN_FAST_FN(noDict, 7, 0)
=20
 size_t ZSTD_compressBlock_fast(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
 {
-    U32 const mls =3D ms->cParams.minMatch;
+    U32 const mml =3D ms->cParams.minMatch;
+    /* use cmov when "candidate in range" branch is likely unpredictable */
+    int const useCmov =3D ms->cParams.windowLog < 19;
     assert(ms->dictMatchState =3D=3D NULL);
-    if (ms->cParams.targetLength > 1) {
-        switch(mls)
+    if (useCmov) {
+        switch(mml)
         {
         default: /* includes case 3 */
         case 4 :
@@ -352,7 +461,8 @@ size_t ZSTD_compressBlock_fast(
             return ZSTD_compressBlock_fast_noDict_7_1(ms, seqStore, rep, s=
rc, srcSize);
         }
     } else {
-        switch(mls)
+        /* use a branch instead */
+        switch(mml)
         {
         default: /* includes case 3 */
         case 4 :
@@ -364,13 +474,13 @@ size_t ZSTD_compressBlock_fast(
         case 7 :
             return ZSTD_compressBlock_fast_noDict_7_0(ms, seqStore, rep, s=
rc, srcSize);
         }
-
     }
 }
=20
 FORCE_INLINE_TEMPLATE
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
 size_t ZSTD_compressBlock_fast_dictMatchState_generic(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize, U32 const mls, U32 const hasStep)
 {
     const ZSTD_compressionParameters* const cParams =3D &ms->cParams;
@@ -380,16 +490,16 @@ size_t ZSTD_compressBlock_fast_dictMatchState_generic(
     U32 const stepSize =3D cParams->targetLength + !(cParams->targetLength=
);
     const BYTE* const base =3D ms->window.base;
     const BYTE* const istart =3D (const BYTE*)src;
-    const BYTE* ip =3D istart;
+    const BYTE* ip0 =3D istart;
+    const BYTE* ip1 =3D ip0 + stepSize; /* we assert below that stepSize >=
=3D 1 */
     const BYTE* anchor =3D istart;
     const U32   prefixStartIndex =3D ms->window.dictLimit;
     const BYTE* const prefixStart =3D base + prefixStartIndex;
     const BYTE* const iend =3D istart + srcSize;
     const BYTE* const ilimit =3D iend - HASH_READ_SIZE;
     U32 offset_1=3Drep[0], offset_2=3Drep[1];
-    U32 offsetSaved =3D 0;
=20
-    const ZSTD_matchState_t* const dms =3D ms->dictMatchState;
+    const ZSTD_MatchState_t* const dms =3D ms->dictMatchState;
     const ZSTD_compressionParameters* const dictCParams =3D &dms->cParams ;
     const U32* const dictHashTable =3D dms->hashTable;
     const U32 dictStartIndex       =3D dms->window.dictLimit;
@@ -397,13 +507,13 @@ size_t ZSTD_compressBlock_fast_dictMatchState_generic(
     const BYTE* const dictStart    =3D dictBase + dictStartIndex;
     const BYTE* const dictEnd      =3D dms->window.nextSrc;
     const U32 dictIndexDelta       =3D prefixStartIndex - (U32)(dictEnd - =
dictBase);
-    const U32 dictAndPrefixLength  =3D (U32)(ip - prefixStart + dictEnd - =
dictStart);
-    const U32 dictHLog             =3D dictCParams->hashLog;
+    const U32 dictAndPrefixLength  =3D (U32)(istart - prefixStart + dictEn=
d - dictStart);
+    const U32 dictHBits            =3D dictCParams->hashLog + ZSTD_SHORT_C=
ACHE_TAG_BITS;
=20
     /* if a dictionary is still attached, it necessarily means that
      * it is within window size. So we just check it. */
     const U32 maxDistance =3D 1U << cParams->windowLog;
-    const U32 endIndex =3D (U32)((size_t)(ip - base) + srcSize);
+    const U32 endIndex =3D (U32)((size_t)(istart - base) + srcSize);
     assert(endIndex - prefixStartIndex <=3D maxDistance);
     (void)maxDistance; (void)endIndex;   /* these variables are not used w=
hen assert() is disabled */
=20
@@ -413,106 +523,154 @@ size_t ZSTD_compressBlock_fast_dictMatchState_gener=
ic(
      * when translating a dict index into a local index */
     assert(prefixStartIndex >=3D (U32)(dictEnd - dictBase));
=20
+    if (ms->prefetchCDictTables) {
+        size_t const hashTableBytes =3D (((size_t)1) << dictCParams->hashL=
og) * sizeof(U32);
+        PREFETCH_AREA(dictHashTable, hashTableBytes);
+    }
+
     /* init */
     DEBUGLOG(5, "ZSTD_compressBlock_fast_dictMatchState_generic");
-    ip +=3D (dictAndPrefixLength =3D=3D 0);
+    ip0 +=3D (dictAndPrefixLength =3D=3D 0);
     /* dictMatchState repCode checks don't currently handle repCode =3D=3D=
 0
      * disabling. */
     assert(offset_1 <=3D dictAndPrefixLength);
     assert(offset_2 <=3D dictAndPrefixLength);
=20
-    /* Main Search Loop */
-    while (ip < ilimit) {   /* < instead of <=3D, because repcode check at=
 (ip+1) */
+    /* Outer search loop */
+    assert(stepSize >=3D 1);
+    while (ip1 <=3D ilimit) {   /* repcode check at (ip0 + 1) is safe beca=
use ip0 < ip1 */
         size_t mLength;
-        size_t const h =3D ZSTD_hashPtr(ip, hlog, mls);
-        U32 const curr =3D (U32)(ip-base);
-        U32 const matchIndex =3D hashTable[h];
-        const BYTE* match =3D base + matchIndex;
-        const U32 repIndex =3D curr + 1 - offset_1;
-        const BYTE* repMatch =3D (repIndex < prefixStartIndex) ?
-                               dictBase + (repIndex - dictIndexDelta) :
-                               base + repIndex;
-        hashTable[h] =3D curr;   /* update hash table */
-
-        if ( ((U32)((prefixStartIndex-1) - repIndex) >=3D 3) /* intentiona=
l underflow : ensure repIndex isn't overlapping dict + prefix */
-          && (MEM_read32(repMatch) =3D=3D MEM_read32(ip+1)) ) {
-            const BYTE* const repMatchEnd =3D repIndex < prefixStartIndex =
? dictEnd : iend;
-            mLength =3D ZSTD_count_2segments(ip+1+4, repMatch+4, iend, rep=
MatchEnd, prefixStart) + 4;
-            ip++;
-            ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend, STO=
RE_REPCODE_1, mLength);
-        } else if ( (matchIndex <=3D prefixStartIndex) ) {
-            size_t const dictHash =3D ZSTD_hashPtr(ip, dictHLog, mls);
-            U32 const dictMatchIndex =3D dictHashTable[dictHash];
-            const BYTE* dictMatch =3D dictBase + dictMatchIndex;
-            if (dictMatchIndex <=3D dictStartIndex ||
-                MEM_read32(dictMatch) !=3D MEM_read32(ip)) {
-                assert(stepSize >=3D 1);
-                ip +=3D ((ip-anchor) >> kSearchStrength) + stepSize;
-                continue;
-            } else {
-                /* found a dict match */
-                U32 const offset =3D (U32)(curr-dictMatchIndex-dictIndexDe=
lta);
-                mLength =3D ZSTD_count_2segments(ip+4, dictMatch+4, iend, =
dictEnd, prefixStart) + 4;
-                while (((ip>anchor) & (dictMatch>dictStart))
-                     && (ip[-1] =3D=3D dictMatch[-1])) {
-                    ip--; dictMatch--; mLength++;
+        size_t hash0 =3D ZSTD_hashPtr(ip0, hlog, mls);
+
+        size_t const dictHashAndTag0 =3D ZSTD_hashPtr(ip0, dictHBits, mls);
+        U32 dictMatchIndexAndTag =3D dictHashTable[dictHashAndTag0 >> ZSTD=
_SHORT_CACHE_TAG_BITS];
+        int dictTagsMatch =3D ZSTD_comparePackedTags(dictMatchIndexAndTag,=
 dictHashAndTag0);
+
+        U32 matchIndex =3D hashTable[hash0];
+        U32 curr =3D (U32)(ip0 - base);
+        size_t step =3D stepSize;
+        const size_t kStepIncr =3D 1 << kSearchStrength;
+        const BYTE* nextStep =3D ip0 + kStepIncr;
+
+        /* Inner search loop */
+        while (1) {
+            const BYTE* match =3D base + matchIndex;
+            const U32 repIndex =3D curr + 1 - offset_1;
+            const BYTE* repMatch =3D (repIndex < prefixStartIndex) ?
+                                   dictBase + (repIndex - dictIndexDelta) :
+                                   base + repIndex;
+            const size_t hash1 =3D ZSTD_hashPtr(ip1, hlog, mls);
+            size_t const dictHashAndTag1 =3D ZSTD_hashPtr(ip1, dictHBits, =
mls);
+            hashTable[hash0] =3D curr;   /* update hash table */
+
+            if ((ZSTD_index_overlap_check(prefixStartIndex, repIndex))
+                && (MEM_read32(repMatch) =3D=3D MEM_read32(ip0 + 1))) {
+                const BYTE* const repMatchEnd =3D repIndex < prefixStartIn=
dex ? dictEnd : iend;
+                mLength =3D ZSTD_count_2segments(ip0 + 1 + 4, repMatch + 4=
, iend, repMatchEnd, prefixStart) + 4;
+                ip0++;
+                ZSTD_storeSeq(seqStore, (size_t) (ip0 - anchor), anchor, i=
end, REPCODE1_TO_OFFBASE, mLength);
+                break;
+            }
+
+            if (dictTagsMatch) {
+                /* Found a possible dict match */
+                const U32 dictMatchIndex =3D dictMatchIndexAndTag >> ZSTD_=
SHORT_CACHE_TAG_BITS;
+                const BYTE* dictMatch =3D dictBase + dictMatchIndex;
+                if (dictMatchIndex > dictStartIndex &&
+                    MEM_read32(dictMatch) =3D=3D MEM_read32(ip0)) {
+                    /* To replicate extDict parse behavior, we only use di=
ct matches when the normal matchIndex is invalid */
+                    if (matchIndex <=3D prefixStartIndex) {
+                        U32 const offset =3D (U32) (curr - dictMatchIndex =
- dictIndexDelta);
+                        mLength =3D ZSTD_count_2segments(ip0 + 4, dictMatc=
h + 4, iend, dictEnd, prefixStart) + 4;
+                        while (((ip0 > anchor) & (dictMatch > dictStart))
+                            && (ip0[-1] =3D=3D dictMatch[-1])) {
+                            ip0--;
+                            dictMatch--;
+                            mLength++;
+                        } /* catch up */
+                        offset_2 =3D offset_1;
+                        offset_1 =3D offset;
+                        ZSTD_storeSeq(seqStore, (size_t) (ip0 - anchor), a=
nchor, iend, OFFSET_TO_OFFBASE(offset), mLength);
+                        break;
+                    }
+                }
+            }
+
+            if (ZSTD_match4Found_cmov(ip0, match, matchIndex, prefixStartI=
ndex)) {
+                /* found a regular match of size >=3D 4 */
+                U32 const offset =3D (U32) (ip0 - match);
+                mLength =3D ZSTD_count(ip0 + 4, match + 4, iend) + 4;
+                while (((ip0 > anchor) & (match > prefixStart))
+                       && (ip0[-1] =3D=3D match[-1])) {
+                    ip0--;
+                    match--;
+                    mLength++;
                 } /* catch up */
                 offset_2 =3D offset_1;
                 offset_1 =3D offset;
-                ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend,=
 STORE_OFFSET(offset), mLength);
+                ZSTD_storeSeq(seqStore, (size_t) (ip0 - anchor), anchor, i=
end, OFFSET_TO_OFFBASE(offset), mLength);
+                break;
             }
-        } else if (MEM_read32(match) !=3D MEM_read32(ip)) {
-            /* it's not a match, and we're not going to check the dictiona=
ry */
-            assert(stepSize >=3D 1);
-            ip +=3D ((ip-anchor) >> kSearchStrength) + stepSize;
-            continue;
-        } else {
-            /* found a regular match */
-            U32 const offset =3D (U32)(ip-match);
-            mLength =3D ZSTD_count(ip+4, match+4, iend) + 4;
-            while (((ip>anchor) & (match>prefixStart))
-                 && (ip[-1] =3D=3D match[-1])) { ip--; match--; mLength++;=
 } /* catch up */
-            offset_2 =3D offset_1;
-            offset_1 =3D offset;
-            ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend, STO=
RE_OFFSET(offset), mLength);
-        }
+
+            /* Prepare for next iteration */
+            dictMatchIndexAndTag =3D dictHashTable[dictHashAndTag1 >> ZSTD=
_SHORT_CACHE_TAG_BITS];
+            dictTagsMatch =3D ZSTD_comparePackedTags(dictMatchIndexAndTag,=
 dictHashAndTag1);
+            matchIndex =3D hashTable[hash1];
+
+            if (ip1 >=3D nextStep) {
+                step++;
+                nextStep +=3D kStepIncr;
+            }
+            ip0 =3D ip1;
+            ip1 =3D ip1 + step;
+            if (ip1 > ilimit) goto _cleanup;
+
+            curr =3D (U32)(ip0 - base);
+            hash0 =3D hash1;
+        }   /* end inner search loop */
=20
         /* match found */
-        ip +=3D mLength;
-        anchor =3D ip;
+        assert(mLength);
+        ip0 +=3D mLength;
+        anchor =3D ip0;
=20
-        if (ip <=3D ilimit) {
+        if (ip0 <=3D ilimit) {
             /* Fill Table */
             assert(base+curr+2 > istart);  /* check base overflow */
             hashTable[ZSTD_hashPtr(base+curr+2, hlog, mls)] =3D curr+2;  /=
* here because curr+2 could be > iend-8 */
-            hashTable[ZSTD_hashPtr(ip-2, hlog, mls)] =3D (U32)(ip-2-base);
+            hashTable[ZSTD_hashPtr(ip0-2, hlog, mls)] =3D (U32)(ip0-2-base=
);
=20
             /* check immediate repcode */
-            while (ip <=3D ilimit) {
-                U32 const current2 =3D (U32)(ip-base);
+            while (ip0 <=3D ilimit) {
+                U32 const current2 =3D (U32)(ip0-base);
                 U32 const repIndex2 =3D current2 - offset_2;
                 const BYTE* repMatch2 =3D repIndex2 < prefixStartIndex ?
                         dictBase - dictIndexDelta + repIndex2 :
                         base + repIndex2;
-                if ( ((U32)((prefixStartIndex-1) - (U32)repIndex2) >=3D 3 =
/* intentional overflow */)
-                   && (MEM_read32(repMatch2) =3D=3D MEM_read32(ip)) ) {
+                if ( (ZSTD_index_overlap_check(prefixStartIndex, repIndex2=
))
+                   && (MEM_read32(repMatch2) =3D=3D MEM_read32(ip0))) {
                     const BYTE* const repEnd2 =3D repIndex2 < prefixStartI=
ndex ? dictEnd : iend;
-                    size_t const repLength2 =3D ZSTD_count_2segments(ip+4,=
 repMatch2+4, iend, repEnd2, prefixStart) + 4;
+                    size_t const repLength2 =3D ZSTD_count_2segments(ip0+4=
, repMatch2+4, iend, repEnd2, prefixStart) + 4;
                     U32 tmpOffset =3D offset_2; offset_2 =3D offset_1; off=
set_1 =3D tmpOffset;   /* swap offset_2 <=3D> offset_1 */
-                    ZSTD_storeSeq(seqStore, 0, anchor, iend, STORE_REPCODE=
_1, repLength2);
-                    hashTable[ZSTD_hashPtr(ip, hlog, mls)] =3D current2;
-                    ip +=3D repLength2;
-                    anchor =3D ip;
+                    ZSTD_storeSeq(seqStore, 0, anchor, iend, REPCODE1_TO_O=
FFBASE, repLength2);
+                    hashTable[ZSTD_hashPtr(ip0, hlog, mls)] =3D current2;
+                    ip0 +=3D repLength2;
+                    anchor =3D ip0;
                     continue;
                 }
                 break;
             }
         }
+
+        /* Prepare for next iteration */
+        assert(ip0 =3D=3D anchor);
+        ip1 =3D ip0 + stepSize;
     }
=20
+_cleanup:
     /* save reps for next block */
-    rep[0] =3D offset_1 ? offset_1 : offsetSaved;
-    rep[1] =3D offset_2 ? offset_2 : offsetSaved;
+    rep[0] =3D offset_1;
+    rep[1] =3D offset_2;
=20
     /* Return the last literals size */
     return (size_t)(iend - anchor);
@@ -525,7 +683,7 @@ ZSTD_GEN_FAST_FN(dictMatchState, 6, 0)
 ZSTD_GEN_FAST_FN(dictMatchState, 7, 0)
=20
 size_t ZSTD_compressBlock_fast_dictMatchState(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
 {
     U32 const mls =3D ms->cParams.minMatch;
@@ -545,19 +703,20 @@ size_t ZSTD_compressBlock_fast_dictMatchState(
 }
=20
=20
-static size_t ZSTD_compressBlock_fast_extDict_generic(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+static
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
+size_t ZSTD_compressBlock_fast_extDict_generic(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize, U32 const mls, U32 const hasStep)
 {
     const ZSTD_compressionParameters* const cParams =3D &ms->cParams;
     U32* const hashTable =3D ms->hashTable;
     U32 const hlog =3D cParams->hashLog;
     /* support stepSize of 0 */
-    U32 const stepSize =3D cParams->targetLength + !(cParams->targetLength=
);
+    size_t const stepSize =3D cParams->targetLength + !(cParams->targetLen=
gth) + 1;
     const BYTE* const base =3D ms->window.base;
     const BYTE* const dictBase =3D ms->window.dictBase;
     const BYTE* const istart =3D (const BYTE*)src;
-    const BYTE* ip =3D istart;
     const BYTE* anchor =3D istart;
     const U32   endIndex =3D (U32)((size_t)(istart - base) + srcSize);
     const U32   lowLimit =3D ZSTD_getLowestMatchIndex(ms, endIndex, cParam=
s->windowLog);
@@ -570,6 +729,28 @@ static size_t ZSTD_compressBlock_fast_extDict_generic(
     const BYTE* const iend =3D istart + srcSize;
     const BYTE* const ilimit =3D iend - 8;
     U32 offset_1=3Drep[0], offset_2=3Drep[1];
+    U32 offsetSaved1 =3D 0, offsetSaved2 =3D 0;
+
+    const BYTE* ip0 =3D istart;
+    const BYTE* ip1;
+    const BYTE* ip2;
+    const BYTE* ip3;
+    U32 current0;
+
+
+    size_t hash0; /* hash for ip0 */
+    size_t hash1; /* hash for ip1 */
+    U32 idx; /* match idx for ip0 */
+    const BYTE* idxBase; /* base pointer for idx */
+
+    U32 offcode;
+    const BYTE* match0;
+    size_t mLength;
+    const BYTE* matchEnd =3D 0; /* initialize to avoid warning, assert !=
=3D 0 later */
+
+    size_t step;
+    const BYTE* nextStep;
+    const size_t kStepIncr =3D (1 << (kSearchStrength - 1));
=20
     (void)hasStep; /* not currently specialized on whether it's accelerate=
d */
=20
@@ -579,75 +760,202 @@ static size_t ZSTD_compressBlock_fast_extDict_generi=
c(
     if (prefixStartIndex =3D=3D dictStartIndex)
         return ZSTD_compressBlock_fast(ms, seqStore, rep, src, srcSize);
=20
-    /* Search Loop */
-    while (ip < ilimit) {  /* < instead of <=3D, because (ip+1) */
-        const size_t h =3D ZSTD_hashPtr(ip, hlog, mls);
-        const U32    matchIndex =3D hashTable[h];
-        const BYTE* const matchBase =3D matchIndex < prefixStartIndex ? di=
ctBase : base;
-        const BYTE*  match =3D matchBase + matchIndex;
-        const U32    curr =3D (U32)(ip-base);
-        const U32    repIndex =3D curr + 1 - offset_1;
-        const BYTE* const repBase =3D repIndex < prefixStartIndex ? dictBa=
se : base;
-        const BYTE* const repMatch =3D repBase + repIndex;
-        hashTable[h] =3D curr;   /* update hash table */
-        DEBUGLOG(7, "offset_1 =3D %u , curr =3D %u", offset_1, curr);
-
-        if ( ( ((U32)((prefixStartIndex-1) - repIndex) >=3D 3) /* intentio=
nal underflow */
-             & (offset_1 <=3D curr+1 - dictStartIndex) ) /* note: we are s=
earching at curr+1 */
-           && (MEM_read32(repMatch) =3D=3D MEM_read32(ip+1)) ) {
-            const BYTE* const repMatchEnd =3D repIndex < prefixStartIndex =
? dictEnd : iend;
-            size_t const rLength =3D ZSTD_count_2segments(ip+1 +4, repMatc=
h +4, iend, repMatchEnd, prefixStart) + 4;
-            ip++;
-            ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend, STO=
RE_REPCODE_1, rLength);
-            ip +=3D rLength;
-            anchor =3D ip;
-        } else {
-            if ( (matchIndex < dictStartIndex) ||
-                 (MEM_read32(match) !=3D MEM_read32(ip)) ) {
-                assert(stepSize >=3D 1);
-                ip +=3D ((ip-anchor) >> kSearchStrength) + stepSize;
-                continue;
+    {   U32 const curr =3D (U32)(ip0 - base);
+        U32 const maxRep =3D curr - dictStartIndex;
+        if (offset_2 >=3D maxRep) offsetSaved2 =3D offset_2, offset_2 =3D =
0;
+        if (offset_1 >=3D maxRep) offsetSaved1 =3D offset_1, offset_1 =3D =
0;
+    }
+
+    /* start each op */
+_start: /* Requires: ip0 */
+
+    step =3D stepSize;
+    nextStep =3D ip0 + kStepIncr;
+
+    /* calculate positions, ip0 - anchor =3D=3D 0, so we skip step calc */
+    ip1 =3D ip0 + 1;
+    ip2 =3D ip0 + step;
+    ip3 =3D ip2 + 1;
+
+    if (ip3 >=3D ilimit) {
+        goto _cleanup;
+    }
+
+    hash0 =3D ZSTD_hashPtr(ip0, hlog, mls);
+    hash1 =3D ZSTD_hashPtr(ip1, hlog, mls);
+
+    idx =3D hashTable[hash0];
+    idxBase =3D idx < prefixStartIndex ? dictBase : base;
+
+    do {
+        {   /* load repcode match for ip[2] */
+            U32 const current2 =3D (U32)(ip2 - base);
+            U32 const repIndex =3D current2 - offset_1;
+            const BYTE* const repBase =3D repIndex < prefixStartIndex ? di=
ctBase : base;
+            U32 rval;
+            if ( ((U32)(prefixStartIndex - repIndex) >=3D 4) /* intentiona=
l underflow */
+                 & (offset_1 > 0) ) {
+                rval =3D MEM_read32(repBase + repIndex);
+            } else {
+                rval =3D MEM_read32(ip2) ^ 1; /* guaranteed to not match. =
*/
             }
-            {   const BYTE* const matchEnd =3D matchIndex < prefixStartInd=
ex ? dictEnd : iend;
-                const BYTE* const lowMatchPtr =3D matchIndex < prefixStart=
Index ? dictStart : prefixStart;
-                U32 const offset =3D curr - matchIndex;
-                size_t mLength =3D ZSTD_count_2segments(ip+4, match+4, ien=
d, matchEnd, prefixStart) + 4;
-                while (((ip>anchor) & (match>lowMatchPtr)) && (ip[-1] =3D=
=3D match[-1])) { ip--; match--; mLength++; }   /* catch up */
-                offset_2 =3D offset_1; offset_1 =3D offset;  /* update off=
set history */
-                ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend,=
 STORE_OFFSET(offset), mLength);
-                ip +=3D mLength;
-                anchor =3D ip;
+
+            /* write back hash table entry */
+            current0 =3D (U32)(ip0 - base);
+            hashTable[hash0] =3D current0;
+
+            /* check repcode at ip[2] */
+            if (MEM_read32(ip2) =3D=3D rval) {
+                ip0 =3D ip2;
+                match0 =3D repBase + repIndex;
+                matchEnd =3D repIndex < prefixStartIndex ? dictEnd : iend;
+                assert((match0 !=3D prefixStart) & (match0 !=3D dictStart)=
);
+                mLength =3D ip0[-1] =3D=3D match0[-1];
+                ip0 -=3D mLength;
+                match0 -=3D mLength;
+                offcode =3D REPCODE1_TO_OFFBASE;
+                mLength +=3D 4;
+                goto _match;
         }   }
=20
-        if (ip <=3D ilimit) {
-            /* Fill Table */
-            hashTable[ZSTD_hashPtr(base+curr+2, hlog, mls)] =3D curr+2;
-            hashTable[ZSTD_hashPtr(ip-2, hlog, mls)] =3D (U32)(ip-2-base);
-            /* check immediate repcode */
-            while (ip <=3D ilimit) {
-                U32 const current2 =3D (U32)(ip-base);
-                U32 const repIndex2 =3D current2 - offset_2;
-                const BYTE* const repMatch2 =3D repIndex2 < prefixStartInd=
ex ? dictBase + repIndex2 : base + repIndex2;
-                if ( (((U32)((prefixStartIndex-1) - repIndex2) >=3D 3) & (=
offset_2 <=3D curr - dictStartIndex))  /* intentional overflow */
-                   && (MEM_read32(repMatch2) =3D=3D MEM_read32(ip)) ) {
-                    const BYTE* const repEnd2 =3D repIndex2 < prefixStartI=
ndex ? dictEnd : iend;
-                    size_t const repLength2 =3D ZSTD_count_2segments(ip+4,=
 repMatch2+4, iend, repEnd2, prefixStart) + 4;
-                    { U32 const tmpOffset =3D offset_2; offset_2 =3D offse=
t_1; offset_1 =3D tmpOffset; }  /* swap offset_2 <=3D> offset_1 */
-                    ZSTD_storeSeq(seqStore, 0 /*litlen*/, anchor, iend, ST=
ORE_REPCODE_1, repLength2);
-                    hashTable[ZSTD_hashPtr(ip, hlog, mls)] =3D current2;
-                    ip +=3D repLength2;
-                    anchor =3D ip;
-                    continue;
-                }
-                break;
-    }   }   }
+        {   /* load match for ip[0] */
+            U32 const mval =3D idx >=3D dictStartIndex ?
+                    MEM_read32(idxBase + idx) :
+                    MEM_read32(ip0) ^ 1; /* guaranteed not to match */
+
+            /* check match at ip[0] */
+            if (MEM_read32(ip0) =3D=3D mval) {
+                /* found a match! */
+                goto _offset;
+        }   }
+
+        /* lookup ip[1] */
+        idx =3D hashTable[hash1];
+        idxBase =3D idx < prefixStartIndex ? dictBase : base;
+
+        /* hash ip[2] */
+        hash0 =3D hash1;
+        hash1 =3D ZSTD_hashPtr(ip2, hlog, mls);
+
+        /* advance to next positions */
+        ip0 =3D ip1;
+        ip1 =3D ip2;
+        ip2 =3D ip3;
+
+        /* write back hash table entry */
+        current0 =3D (U32)(ip0 - base);
+        hashTable[hash0] =3D current0;
+
+        {   /* load match for ip[0] */
+            U32 const mval =3D idx >=3D dictStartIndex ?
+                    MEM_read32(idxBase + idx) :
+                    MEM_read32(ip0) ^ 1; /* guaranteed not to match */
+
+            /* check match at ip[0] */
+            if (MEM_read32(ip0) =3D=3D mval) {
+                /* found a match! */
+                goto _offset;
+        }   }
+
+        /* lookup ip[1] */
+        idx =3D hashTable[hash1];
+        idxBase =3D idx < prefixStartIndex ? dictBase : base;
+
+        /* hash ip[2] */
+        hash0 =3D hash1;
+        hash1 =3D ZSTD_hashPtr(ip2, hlog, mls);
+
+        /* advance to next positions */
+        ip0 =3D ip1;
+        ip1 =3D ip2;
+        ip2 =3D ip0 + step;
+        ip3 =3D ip1 + step;
+
+        /* calculate step */
+        if (ip2 >=3D nextStep) {
+            step++;
+            PREFETCH_L1(ip1 + 64);
+            PREFETCH_L1(ip1 + 128);
+            nextStep +=3D kStepIncr;
+        }
+    } while (ip3 < ilimit);
+
+_cleanup:
+    /* Note that there are probably still a couple positions we could sear=
ch.
+     * However, it seems to be a meaningful performance hit to try to sear=
ch
+     * them. So let's not. */
+
+    /* If offset_1 started invalid (offsetSaved1 !=3D 0) and became valid =
(offset_1 !=3D 0),
+     * rotate saved offsets. See comment in ZSTD_compressBlock_fast_noDict=
 for more context. */
+    offsetSaved2 =3D ((offsetSaved1 !=3D 0) && (offset_1 !=3D 0)) ? offset=
Saved1 : offsetSaved2;
=20
     /* save reps for next block */
-    rep[0] =3D offset_1;
-    rep[1] =3D offset_2;
+    rep[0] =3D offset_1 ? offset_1 : offsetSaved1;
+    rep[1] =3D offset_2 ? offset_2 : offsetSaved2;
=20
     /* Return the last literals size */
     return (size_t)(iend - anchor);
+
+_offset: /* Requires: ip0, idx, idxBase */
+
+    /* Compute the offset code. */
+    {   U32 const offset =3D current0 - idx;
+        const BYTE* const lowMatchPtr =3D idx < prefixStartIndex ? dictSta=
rt : prefixStart;
+        matchEnd =3D idx < prefixStartIndex ? dictEnd : iend;
+        match0 =3D idxBase + idx;
+        offset_2 =3D offset_1;
+        offset_1 =3D offset;
+        offcode =3D OFFSET_TO_OFFBASE(offset);
+        mLength =3D 4;
+
+        /* Count the backwards match length. */
+        while (((ip0>anchor) & (match0>lowMatchPtr)) && (ip0[-1] =3D=3D ma=
tch0[-1])) {
+            ip0--;
+            match0--;
+            mLength++;
+    }   }
+
+_match: /* Requires: ip0, match0, offcode, matchEnd */
+
+    /* Count the forward length. */
+    assert(matchEnd !=3D 0);
+    mLength +=3D ZSTD_count_2segments(ip0 + mLength, match0 + mLength, ien=
d, matchEnd, prefixStart);
+
+    ZSTD_storeSeq(seqStore, (size_t)(ip0 - anchor), anchor, iend, offcode,=
 mLength);
+
+    ip0 +=3D mLength;
+    anchor =3D ip0;
+
+    /* write next hash table entry */
+    if (ip1 < ip0) {
+        hashTable[hash1] =3D (U32)(ip1 - base);
+    }
+
+    /* Fill table and check for immediate repcode. */
+    if (ip0 <=3D ilimit) {
+        /* Fill Table */
+        assert(base+current0+2 > istart);  /* check base overflow */
+        hashTable[ZSTD_hashPtr(base+current0+2, hlog, mls)] =3D current0+2=
;  /* here because current+2 could be > iend-8 */
+        hashTable[ZSTD_hashPtr(ip0-2, hlog, mls)] =3D (U32)(ip0-2-base);
+
+        while (ip0 <=3D ilimit) {
+            U32 const repIndex2 =3D (U32)(ip0-base) - offset_2;
+            const BYTE* const repMatch2 =3D repIndex2 < prefixStartIndex ?=
 dictBase + repIndex2 : base + repIndex2;
+            if ( ((ZSTD_index_overlap_check(prefixStartIndex, repIndex2)) =
& (offset_2 > 0))
+                 && (MEM_read32(repMatch2) =3D=3D MEM_read32(ip0)) ) {
+                const BYTE* const repEnd2 =3D repIndex2 < prefixStartIndex=
 ? dictEnd : iend;
+                size_t const repLength2 =3D ZSTD_count_2segments(ip0+4, re=
pMatch2+4, iend, repEnd2, prefixStart) + 4;
+                { U32 const tmpOffset =3D offset_2; offset_2 =3D offset_1;=
 offset_1 =3D tmpOffset; }  /* swap offset_2 <=3D> offset_1 */
+                ZSTD_storeSeq(seqStore, 0 /*litlen*/, anchor, iend, REPCOD=
E1_TO_OFFBASE, repLength2);
+                hashTable[ZSTD_hashPtr(ip0, hlog, mls)] =3D (U32)(ip0-base=
);
+                ip0 +=3D repLength2;
+                anchor =3D ip0;
+                continue;
+            }
+            break;
+    }   }
+
+    goto _start;
 }
=20
 ZSTD_GEN_FAST_FN(extDict, 4, 0)
@@ -656,10 +964,11 @@ ZSTD_GEN_FAST_FN(extDict, 6, 0)
 ZSTD_GEN_FAST_FN(extDict, 7, 0)
=20
 size_t ZSTD_compressBlock_fast_extDict(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
 {
     U32 const mls =3D ms->cParams.minMatch;
+    assert(ms->dictMatchState =3D=3D NULL);
     switch(mls)
     {
     default: /* includes case 3 */
diff --git a/lib/zstd/compress/zstd_fast.h b/lib/zstd/compress/zstd_fast.h
index fddc2f532d21..04fde0a72a4e 100644
--- a/lib/zstd/compress/zstd_fast.h
+++ b/lib/zstd/compress/zstd_fast.h
@@ -1,5 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -11,21 +12,20 @@
 #ifndef ZSTD_FAST_H
 #define ZSTD_FAST_H
=20
-
 #include "../common/mem.h"      /* U32 */
 #include "zstd_compress_internal.h"
=20
-void ZSTD_fillHashTable(ZSTD_matchState_t* ms,
-                        void const* end, ZSTD_dictTableLoadMethod_e dtlm);
+void ZSTD_fillHashTable(ZSTD_MatchState_t* ms,
+                        void const* end, ZSTD_dictTableLoadMethod_e dtlm,
+                        ZSTD_tableFillPurpose_e tfp);
 size_t ZSTD_compressBlock_fast(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
 size_t ZSTD_compressBlock_fast_dictMatchState(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
 size_t ZSTD_compressBlock_fast_extDict(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
=20
-
 #endif /* ZSTD_FAST_H */
diff --git a/lib/zstd/compress/zstd_lazy.c b/lib/zstd/compress/zstd_lazy.c
index 0298a01a7504..88e2501fe3ef 100644
--- a/lib/zstd/compress/zstd_lazy.c
+++ b/lib/zstd/compress/zstd_lazy.c
@@ -1,5 +1,6 @@
+// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -10,14 +11,23 @@
=20
 #include "zstd_compress_internal.h"
 #include "zstd_lazy.h"
+#include "../common/bits.h" /* ZSTD_countTrailingZeros64 */
+
+#if !defined(ZSTD_EXCLUDE_GREEDY_BLOCK_COMPRESSOR) \
+ || !defined(ZSTD_EXCLUDE_LAZY_BLOCK_COMPRESSOR) \
+ || !defined(ZSTD_EXCLUDE_LAZY2_BLOCK_COMPRESSOR) \
+ || !defined(ZSTD_EXCLUDE_BTLAZY2_BLOCK_COMPRESSOR)
+
+#define kLazySkippingStep 8
=20
=20
 /*-*************************************
 *  Binary Tree search
 ***************************************/
=20
-static void
-ZSTD_updateDUBT(ZSTD_matchState_t* ms,
+static
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
+void ZSTD_updateDUBT(ZSTD_MatchState_t* ms,
                 const BYTE* ip, const BYTE* iend,
                 U32 mls)
 {
@@ -60,8 +70,9 @@ ZSTD_updateDUBT(ZSTD_matchState_t* ms,
  *  sort one already inserted but unsorted position
  *  assumption : curr >=3D btlow =3D=3D (curr - btmask)
  *  doesn't fail */
-static void
-ZSTD_insertDUBT1(const ZSTD_matchState_t* ms,
+static
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
+void ZSTD_insertDUBT1(const ZSTD_MatchState_t* ms,
                  U32 curr, const BYTE* inputEnd,
                  U32 nbCompares, U32 btLow,
                  const ZSTD_dictMode_e dictMode)
@@ -149,9 +160,10 @@ ZSTD_insertDUBT1(const ZSTD_matchState_t* ms,
 }
=20
=20
-static size_t
-ZSTD_DUBT_findBetterDictMatch (
-        const ZSTD_matchState_t* ms,
+static
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
+size_t ZSTD_DUBT_findBetterDictMatch (
+        const ZSTD_MatchState_t* ms,
         const BYTE* const ip, const BYTE* const iend,
         size_t* offsetPtr,
         size_t bestLength,
@@ -159,7 +171,7 @@ ZSTD_DUBT_findBetterDictMatch (
         U32 const mls,
         const ZSTD_dictMode_e dictMode)
 {
-    const ZSTD_matchState_t * const dms =3D ms->dictMatchState;
+    const ZSTD_MatchState_t * const dms =3D ms->dictMatchState;
     const ZSTD_compressionParameters* const dmsCParams =3D &dms->cParams;
     const U32 * const dictHashTable =3D dms->hashTable;
     U32         const hashLog =3D dmsCParams->hashLog;
@@ -197,8 +209,8 @@ ZSTD_DUBT_findBetterDictMatch (
             U32 matchIndex =3D dictMatchIndex + dictIndexDelta;
             if ( (4*(int)(matchLength-bestLength)) > (int)(ZSTD_highbit32(=
curr-matchIndex+1) - ZSTD_highbit32((U32)offsetPtr[0]+1)) ) {
                 DEBUGLOG(9, "ZSTD_DUBT_findBetterDictMatch(%u) : found bet=
ter match length %u -> %u and offsetCode %u -> %u (dictMatchIndex %u, match=
Index %u)",
-                    curr, (U32)bestLength, (U32)matchLength, (U32)*offsetP=
tr, STORE_OFFSET(curr - matchIndex), dictMatchIndex, matchIndex);
-                bestLength =3D matchLength, *offsetPtr =3D STORE_OFFSET(cu=
rr - matchIndex);
+                    curr, (U32)bestLength, (U32)matchLength, (U32)*offsetP=
tr, OFFSET_TO_OFFBASE(curr - matchIndex), dictMatchIndex, matchIndex);
+                bestLength =3D matchLength, *offsetPtr =3D OFFSET_TO_OFFBA=
SE(curr - matchIndex);
             }
             if (ip+matchLength =3D=3D iend) {   /* reached end of input : =
ip[matchLength] is not valid, no way to know if it's larger or smaller than=
 match */
                 break;   /* drop, to guarantee consistency (miss a little =
bit of compression) */
@@ -218,7 +230,7 @@ ZSTD_DUBT_findBetterDictMatch (
     }
=20
     if (bestLength >=3D MINMATCH) {
-        U32 const mIndex =3D curr - (U32)STORED_OFFSET(*offsetPtr); (void)=
mIndex;
+        U32 const mIndex =3D curr - (U32)OFFBASE_TO_OFFSET(*offsetPtr); (v=
oid)mIndex;
         DEBUGLOG(8, "ZSTD_DUBT_findBetterDictMatch(%u) : found match of le=
ngth %u and offsetCode %u (pos %u)",
                     curr, (U32)bestLength, (U32)*offsetPtr, mIndex);
     }
@@ -227,10 +239,11 @@ ZSTD_DUBT_findBetterDictMatch (
 }
=20
=20
-static size_t
-ZSTD_DUBT_findBestMatch(ZSTD_matchState_t* ms,
+static
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
+size_t ZSTD_DUBT_findBestMatch(ZSTD_MatchState_t* ms,
                         const BYTE* const ip, const BYTE* const iend,
-                        size_t* offsetPtr,
+                        size_t* offBasePtr,
                         U32 const mls,
                         const ZSTD_dictMode_e dictMode)
 {
@@ -327,8 +340,8 @@ ZSTD_DUBT_findBestMatch(ZSTD_matchState_t* ms,
             if (matchLength > bestLength) {
                 if (matchLength > matchEndIdx - matchIndex)
                     matchEndIdx =3D matchIndex + (U32)matchLength;
-                if ( (4*(int)(matchLength-bestLength)) > (int)(ZSTD_highbi=
t32(curr-matchIndex+1) - ZSTD_highbit32((U32)offsetPtr[0]+1)) )
-                    bestLength =3D matchLength, *offsetPtr =3D STORE_OFFSE=
T(curr - matchIndex);
+                if ( (4*(int)(matchLength-bestLength)) > (int)(ZSTD_highbi=
t32(curr - matchIndex + 1) - ZSTD_highbit32((U32)*offBasePtr)) )
+                    bestLength =3D matchLength, *offBasePtr =3D OFFSET_TO_=
OFFBASE(curr - matchIndex);
                 if (ip+matchLength =3D=3D iend) {   /* equal : no way to k=
now if inf or sup */
                     if (dictMode =3D=3D ZSTD_dictMatchState) {
                         nbCompares =3D 0; /* in addition to avoiding check=
ing any
@@ -361,16 +374,16 @@ ZSTD_DUBT_findBestMatch(ZSTD_matchState_t* ms,
         if (dictMode =3D=3D ZSTD_dictMatchState && nbCompares) {
             bestLength =3D ZSTD_DUBT_findBetterDictMatch(
                     ms, ip, iend,
-                    offsetPtr, bestLength, nbCompares,
+                    offBasePtr, bestLength, nbCompares,
                     mls, dictMode);
         }
=20
         assert(matchEndIdx > curr+8); /* ensure nextToUpdate is increased =
*/
         ms->nextToUpdate =3D matchEndIdx - 8;   /* skip repetitive pattern=
s */
         if (bestLength >=3D MINMATCH) {
-            U32 const mIndex =3D curr - (U32)STORED_OFFSET(*offsetPtr); (v=
oid)mIndex;
+            U32 const mIndex =3D curr - (U32)OFFBASE_TO_OFFSET(*offBasePtr=
); (void)mIndex;
             DEBUGLOG(8, "ZSTD_DUBT_findBestMatch(%u) : found match of leng=
th %u and offsetCode %u (pos %u)",
-                        curr, (U32)bestLength, (U32)*offsetPtr, mIndex);
+                        curr, (U32)bestLength, (U32)*offBasePtr, mIndex);
         }
         return bestLength;
     }
@@ -378,24 +391,25 @@ ZSTD_DUBT_findBestMatch(ZSTD_matchState_t* ms,
=20
=20
 /* ZSTD_BtFindBestMatch() : Tree updater, providing best match */
-FORCE_INLINE_TEMPLATE size_t
-ZSTD_BtFindBestMatch( ZSTD_matchState_t* ms,
+FORCE_INLINE_TEMPLATE
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
+size_t ZSTD_BtFindBestMatch( ZSTD_MatchState_t* ms,
                 const BYTE* const ip, const BYTE* const iLimit,
-                      size_t* offsetPtr,
+                      size_t* offBasePtr,
                 const U32 mls /* template */,
                 const ZSTD_dictMode_e dictMode)
 {
     DEBUGLOG(7, "ZSTD_BtFindBestMatch");
     if (ip < ms->window.base + ms->nextToUpdate) return 0;   /* skipped ar=
ea */
     ZSTD_updateDUBT(ms, ip, iLimit, mls);
-    return ZSTD_DUBT_findBestMatch(ms, ip, iLimit, offsetPtr, mls, dictMod=
e);
+    return ZSTD_DUBT_findBestMatch(ms, ip, iLimit, offBasePtr, mls, dictMo=
de);
 }
=20
 /* *********************************
 * Dedicated dict search
 ***********************************/
=20
-void ZSTD_dedicatedDictSearch_lazy_loadDictionary(ZSTD_matchState_t* ms, c=
onst BYTE* const ip)
+void ZSTD_dedicatedDictSearch_lazy_loadDictionary(ZSTD_MatchState_t* ms, c=
onst BYTE* const ip)
 {
     const BYTE* const base =3D ms->window.base;
     U32 const target =3D (U32)(ip - base);
@@ -514,7 +528,7 @@ void ZSTD_dedicatedDictSearch_lazy_loadDictionary(ZSTD_=
matchState_t* ms, const B
  */
 FORCE_INLINE_TEMPLATE
 size_t ZSTD_dedicatedDictSearch_lazy_search(size_t* offsetPtr, size_t ml, =
U32 nbAttempts,
-                                            const ZSTD_matchState_t* const=
 dms,
+                                            const ZSTD_MatchState_t* const=
 dms,
                                             const BYTE* const ip, const BY=
TE* const iLimit,
                                             const BYTE* const prefixStart,=
 const U32 curr,
                                             const U32 dictLimit, const siz=
e_t ddsIdx) {
@@ -561,7 +575,7 @@ size_t ZSTD_dedicatedDictSearch_lazy_search(size_t* off=
setPtr, size_t ml, U32 nb
         /* save best solution */
         if (currentMl > ml) {
             ml =3D currentMl;
-            *offsetPtr =3D STORE_OFFSET(curr - (matchIndex + ddsIndexDelta=
));
+            *offsetPtr =3D OFFSET_TO_OFFBASE(curr - (matchIndex + ddsIndex=
Delta));
             if (ip+currentMl =3D=3D iLimit) {
                 /* best possible, avoids read overflow on next attempt */
                 return ml;
@@ -598,7 +612,7 @@ size_t ZSTD_dedicatedDictSearch_lazy_search(size_t* off=
setPtr, size_t ml, U32 nb
             /* save best solution */
             if (currentMl > ml) {
                 ml =3D currentMl;
-                *offsetPtr =3D STORE_OFFSET(curr - (matchIndex + ddsIndexD=
elta));
+                *offsetPtr =3D OFFSET_TO_OFFBASE(curr - (matchIndex + ddsI=
ndexDelta));
                 if (ip+currentMl =3D=3D iLimit) break; /* best possible, a=
voids read overflow on next attempt */
             }
         }
@@ -614,10 +628,12 @@ size_t ZSTD_dedicatedDictSearch_lazy_search(size_t* o=
ffsetPtr, size_t ml, U32 nb
=20
 /* Update chains up to ip (excluded)
    Assumption : always within prefix (i.e. not within extDict) */
-FORCE_INLINE_TEMPLATE U32 ZSTD_insertAndFindFirstIndex_internal(
-                        ZSTD_matchState_t* ms,
+FORCE_INLINE_TEMPLATE
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
+U32 ZSTD_insertAndFindFirstIndex_internal(
+                        ZSTD_MatchState_t* ms,
                         const ZSTD_compressionParameters* const cParams,
-                        const BYTE* ip, U32 const mls)
+                        const BYTE* ip, U32 const mls, U32 const lazySkipp=
ing)
 {
     U32* const hashTable  =3D ms->hashTable;
     const U32 hashLog =3D cParams->hashLog;
@@ -632,21 +648,25 @@ FORCE_INLINE_TEMPLATE U32 ZSTD_insertAndFindFirstInde=
x_internal(
         NEXT_IN_CHAIN(idx, chainMask) =3D hashTable[h];
         hashTable[h] =3D idx;
         idx++;
+        /* Stop inserting every position when in the lazy skipping mode. */
+        if (lazySkipping)
+            break;
     }
=20
     ms->nextToUpdate =3D target;
     return hashTable[ZSTD_hashPtr(ip, hashLog, mls)];
 }
=20
-U32 ZSTD_insertAndFindFirstIndex(ZSTD_matchState_t* ms, const BYTE* ip) {
+U32 ZSTD_insertAndFindFirstIndex(ZSTD_MatchState_t* ms, const BYTE* ip) {
     const ZSTD_compressionParameters* const cParams =3D &ms->cParams;
-    return ZSTD_insertAndFindFirstIndex_internal(ms, cParams, ip, ms->cPar=
ams.minMatch);
+    return ZSTD_insertAndFindFirstIndex_internal(ms, cParams, ip, ms->cPar=
ams.minMatch, /* lazySkipping*/ 0);
 }
=20
 /* inlining is important to hardwire a hot branch (template emulation) */
 FORCE_INLINE_TEMPLATE
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
 size_t ZSTD_HcFindBestMatch(
-                        ZSTD_matchState_t* ms,
+                        ZSTD_MatchState_t* ms,
                         const BYTE* const ip, const BYTE* const iLimit,
                         size_t* offsetPtr,
                         const U32 mls, const ZSTD_dictMode_e dictMode)
@@ -670,7 +690,7 @@ size_t ZSTD_HcFindBestMatch(
     U32 nbAttempts =3D 1U << cParams->searchLog;
     size_t ml=3D4-1;
=20
-    const ZSTD_matchState_t* const dms =3D ms->dictMatchState;
+    const ZSTD_MatchState_t* const dms =3D ms->dictMatchState;
     const U32 ddsHashLog =3D dictMode =3D=3D ZSTD_dedicatedDictSearch
                          ? dms->cParams.hashLog - ZSTD_LAZY_DDSS_BUCKET_LO=
G : 0;
     const size_t ddsIdx =3D dictMode =3D=3D ZSTD_dedicatedDictSearch
@@ -684,14 +704,15 @@ size_t ZSTD_HcFindBestMatch(
     }
=20
     /* HC4 match finder */
-    matchIndex =3D ZSTD_insertAndFindFirstIndex_internal(ms, cParams, ip, =
mls);
+    matchIndex =3D ZSTD_insertAndFindFirstIndex_internal(ms, cParams, ip, =
mls, ms->lazySkipping);
=20
     for ( ; (matchIndex>=3DlowLimit) & (nbAttempts>0) ; nbAttempts--) {
         size_t currentMl=3D0;
         if ((dictMode !=3D ZSTD_extDict) || matchIndex >=3D dictLimit) {
             const BYTE* const match =3D base + matchIndex;
             assert(matchIndex >=3D dictLimit);   /* ensures this is true i=
f dictMode !=3D ZSTD_extDict */
-            if (match[ml] =3D=3D ip[ml])   /* potentially better */
+            /* read 4B starting from (match + ml + 1 - sizeof(U32)) */
+            if (MEM_read32(match + ml - 3) =3D=3D MEM_read32(ip + ml - 3))=
   /* potentially better */
                 currentMl =3D ZSTD_count(ip, match, iLimit);
         } else {
             const BYTE* const match =3D dictBase + matchIndex;
@@ -703,7 +724,7 @@ size_t ZSTD_HcFindBestMatch(
         /* save best solution */
         if (currentMl > ml) {
             ml =3D currentMl;
-            *offsetPtr =3D STORE_OFFSET(curr - matchIndex);
+            *offsetPtr =3D OFFSET_TO_OFFBASE(curr - matchIndex);
             if (ip+currentMl =3D=3D iLimit) break; /* best possible, avoid=
s read overflow on next attempt */
         }
=20
@@ -739,7 +760,7 @@ size_t ZSTD_HcFindBestMatch(
             if (currentMl > ml) {
                 ml =3D currentMl;
                 assert(curr > matchIndex + dmsIndexDelta);
-                *offsetPtr =3D STORE_OFFSET(curr - (matchIndex + dmsIndexD=
elta));
+                *offsetPtr =3D OFFSET_TO_OFFBASE(curr - (matchIndex + dmsI=
ndexDelta));
                 if (ip+currentMl =3D=3D iLimit) break; /* best possible, a=
voids read overflow on next attempt */
             }
=20
@@ -756,8 +777,6 @@ size_t ZSTD_HcFindBestMatch(
 * (SIMD) Row-based matchfinder
 ***********************************/
 /* Constants for row-based hash */
-#define ZSTD_ROW_HASH_TAG_OFFSET 16     /* byte offset of hashes in the ma=
tch state's tagTable from the beginning of a row */
-#define ZSTD_ROW_HASH_TAG_BITS 8        /* nb bits to use for the tag */
 #define ZSTD_ROW_HASH_TAG_MASK ((1u << ZSTD_ROW_HASH_TAG_BITS) - 1)
 #define ZSTD_ROW_HASH_MAX_ENTRIES 64    /* absolute maximum number of entr=
ies per row, for all configurations */
=20
@@ -769,64 +788,19 @@ typedef U64 ZSTD_VecMask;   /* Clarifies when we are =
interacting with a U64 repr
  * Starting from the LSB, returns the idx of the next non-zero bit.
  * Basically counting the nb of trailing zeroes.
  */
-static U32 ZSTD_VecMask_next(ZSTD_VecMask val) {
-    assert(val !=3D 0);
-#   if (defined(__GNUC__) && ((__GNUC__ > 3) || ((__GNUC__ =3D=3D 3) && (_=
_GNUC_MINOR__ >=3D 4))))
-    if (sizeof(size_t) =3D=3D 4) {
-        U32 mostSignificantWord =3D (U32)(val >> 32);
-        U32 leastSignificantWord =3D (U32)val;
-        if (leastSignificantWord =3D=3D 0) {
-            return 32 + (U32)__builtin_ctz(mostSignificantWord);
-        } else {
-            return (U32)__builtin_ctz(leastSignificantWord);
-        }
-    } else {
-        return (U32)__builtin_ctzll(val);
-    }
-#   else
-    /* Software ctz version: http://aggregate.org/MAGIC/#Trailing%20Zero%2=
0Count
-     * and: https://stackoverflow.com/questions/2709430/count-number-of-bi=
ts-in-a-64-bit-long-big-integer
-     */
-    val =3D ~val & (val - 1ULL); /* Lowest set bit mask */
-    val =3D val - ((val >> 1) & 0x5555555555555555);
-    val =3D (val & 0x3333333333333333ULL) + ((val >> 2) & 0x33333333333333=
33ULL);
-    return (U32)((((val + (val >> 4)) & 0xF0F0F0F0F0F0F0FULL) * 0x10101010=
1010101ULL) >> 56);
-#   endif
-}
-
-/* ZSTD_rotateRight_*():
- * Rotates a bitfield to the right by "count" bits.
- * https://en.wikipedia.org/w/index.php?title=3DCircular_shift&oldid=3D991=
635599#Implementing_circular_shifts
- */
-FORCE_INLINE_TEMPLATE
-U64 ZSTD_rotateRight_U64(U64 const value, U32 count) {
-    assert(count < 64);
-    count &=3D 0x3F; /* for fickle pattern recognition */
-    return (value >> count) | (U64)(value << ((0U - count) & 0x3F));
-}
-
-FORCE_INLINE_TEMPLATE
-U32 ZSTD_rotateRight_U32(U32 const value, U32 count) {
-    assert(count < 32);
-    count &=3D 0x1F; /* for fickle pattern recognition */
-    return (value >> count) | (U32)(value << ((0U - count) & 0x1F));
-}
-
-FORCE_INLINE_TEMPLATE
-U16 ZSTD_rotateRight_U16(U16 const value, U32 count) {
-    assert(count < 16);
-    count &=3D 0x0F; /* for fickle pattern recognition */
-    return (value >> count) | (U16)(value << ((0U - count) & 0x0F));
+MEM_STATIC U32 ZSTD_VecMask_next(ZSTD_VecMask val) {
+    return ZSTD_countTrailingZeros64(val);
 }
=20
 /* ZSTD_row_nextIndex():
  * Returns the next index to insert at within a tagTable row, and updates =
the "head"
- * value to reflect the update. Essentially cycles backwards from [0, {ent=
ries per row})
+ * value to reflect the update. Essentially cycles backwards from [1, {ent=
ries per row})
  */
 FORCE_INLINE_TEMPLATE U32 ZSTD_row_nextIndex(BYTE* const tagRow, U32 const=
 rowMask) {
-  U32 const next =3D (*tagRow - 1) & rowMask;
-  *tagRow =3D (BYTE)next;
-  return next;
+    U32 next =3D (*tagRow-1) & rowMask;
+    next +=3D (next =3D=3D 0) ? rowMask : 0; /* skip first position */
+    *tagRow =3D (BYTE)next;
+    return next;
 }
=20
 /* ZSTD_isAligned():
@@ -840,7 +814,7 @@ MEM_STATIC int ZSTD_isAligned(void const* ptr, size_t a=
lign) {
 /* ZSTD_row_prefetch():
  * Performs prefetching for the hashTable and tagTable at a given row.
  */
-FORCE_INLINE_TEMPLATE void ZSTD_row_prefetch(U32 const* hashTable, U16 con=
st* tagTable, U32 const relRow, U32 const rowLog) {
+FORCE_INLINE_TEMPLATE void ZSTD_row_prefetch(U32 const* hashTable, BYTE co=
nst* tagTable, U32 const relRow, U32 const rowLog) {
     PREFETCH_L1(hashTable + relRow);
     if (rowLog >=3D 5) {
         PREFETCH_L1(hashTable + relRow + 16);
@@ -859,18 +833,20 @@ FORCE_INLINE_TEMPLATE void ZSTD_row_prefetch(U32 cons=
t* hashTable, U16 const* ta
  * Fill up the hash cache starting at idx, prefetching up to ZSTD_ROW_HASH=
_CACHE_SIZE entries,
  * but not beyond iLimit.
  */
-FORCE_INLINE_TEMPLATE void ZSTD_row_fillHashCache(ZSTD_matchState_t* ms, c=
onst BYTE* base,
+FORCE_INLINE_TEMPLATE
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
+void ZSTD_row_fillHashCache(ZSTD_MatchState_t* ms, const BYTE* base,
                                    U32 const rowLog, U32 const mls,
                                    U32 idx, const BYTE* const iLimit)
 {
     U32 const* const hashTable =3D ms->hashTable;
-    U16 const* const tagTable =3D ms->tagTable;
+    BYTE const* const tagTable =3D ms->tagTable;
     U32 const hashLog =3D ms->rowHashLog;
     U32 const maxElemsToPrefetch =3D (base + idx) > iLimit ? 0 : (U32)(iLi=
mit - (base + idx) + 1);
     U32 const lim =3D idx + MIN(ZSTD_ROW_HASH_CACHE_SIZE, maxElemsToPrefet=
ch);
=20
     for (; idx < lim; ++idx) {
-        U32 const hash =3D (U32)ZSTD_hashPtr(base + idx, hashLog + ZSTD_RO=
W_HASH_TAG_BITS, mls);
+        U32 const hash =3D (U32)ZSTD_hashPtrSalted(base + idx, hashLog + Z=
STD_ROW_HASH_TAG_BITS, mls, ms->hashSalt);
         U32 const row =3D (hash >> ZSTD_ROW_HASH_TAG_BITS) << rowLog;
         ZSTD_row_prefetch(hashTable, tagTable, row, rowLog);
         ms->hashCache[idx & ZSTD_ROW_HASH_CACHE_MASK] =3D hash;
@@ -885,12 +861,15 @@ FORCE_INLINE_TEMPLATE void ZSTD_row_fillHashCache(ZST=
D_matchState_t* ms, const B
  * Returns the hash of base + idx, and replaces the hash in the hash cache=
 with the byte at
  * base + idx + ZSTD_ROW_HASH_CACHE_SIZE. Also prefetches the appropriate =
rows from hashTable and tagTable.
  */
-FORCE_INLINE_TEMPLATE U32 ZSTD_row_nextCachedHash(U32* cache, U32 const* h=
ashTable,
-                                                  U16 const* tagTable, BYT=
E const* base,
+FORCE_INLINE_TEMPLATE
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
+U32 ZSTD_row_nextCachedHash(U32* cache, U32 const* hashTable,
+                                                  BYTE const* tagTable, BY=
TE const* base,
                                                   U32 idx, U32 const hashL=
og,
-                                                  U32 const rowLog, U32 co=
nst mls)
+                                                  U32 const rowLog, U32 co=
nst mls,
+                                                  U64 const hashSalt)
 {
-    U32 const newHash =3D (U32)ZSTD_hashPtr(base+idx+ZSTD_ROW_HASH_CACHE_S=
IZE, hashLog + ZSTD_ROW_HASH_TAG_BITS, mls);
+    U32 const newHash =3D (U32)ZSTD_hashPtrSalted(base+idx+ZSTD_ROW_HASH_C=
ACHE_SIZE, hashLog + ZSTD_ROW_HASH_TAG_BITS, mls, hashSalt);
     U32 const row =3D (newHash >> ZSTD_ROW_HASH_TAG_BITS) << rowLog;
     ZSTD_row_prefetch(hashTable, tagTable, row, rowLog);
     {   U32 const hash =3D cache[idx & ZSTD_ROW_HASH_CACHE_MASK];
@@ -902,28 +881,29 @@ FORCE_INLINE_TEMPLATE U32 ZSTD_row_nextCachedHash(U32=
* cache, U32 const* hashTab
 /* ZSTD_row_update_internalImpl():
  * Updates the hash table with positions starting from updateStartIdx unti=
l updateEndIdx.
  */
-FORCE_INLINE_TEMPLATE void ZSTD_row_update_internalImpl(ZSTD_matchState_t*=
 ms,
-                                                        U32 updateStartIdx=
, U32 const updateEndIdx,
-                                                        U32 const mls, U32=
 const rowLog,
-                                                        U32 const rowMask,=
 U32 const useCache)
+FORCE_INLINE_TEMPLATE
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
+void ZSTD_row_update_internalImpl(ZSTD_MatchState_t* ms,
+                                  U32 updateStartIdx, U32 const updateEndI=
dx,
+                                  U32 const mls, U32 const rowLog,
+                                  U32 const rowMask, U32 const useCache)
 {
     U32* const hashTable =3D ms->hashTable;
-    U16* const tagTable =3D ms->tagTable;
+    BYTE* const tagTable =3D ms->tagTable;
     U32 const hashLog =3D ms->rowHashLog;
     const BYTE* const base =3D ms->window.base;
=20
     DEBUGLOG(6, "ZSTD_row_update_internalImpl(): updateStartIdx=3D%u, upda=
teEndIdx=3D%u", updateStartIdx, updateEndIdx);
     for (; updateStartIdx < updateEndIdx; ++updateStartIdx) {
-        U32 const hash =3D useCache ? ZSTD_row_nextCachedHash(ms->hashCach=
e, hashTable, tagTable, base, updateStartIdx, hashLog, rowLog, mls)
-                                  : (U32)ZSTD_hashPtr(base + updateStartId=
x, hashLog + ZSTD_ROW_HASH_TAG_BITS, mls);
+        U32 const hash =3D useCache ? ZSTD_row_nextCachedHash(ms->hashCach=
e, hashTable, tagTable, base, updateStartIdx, hashLog, rowLog, mls, ms->has=
hSalt)
+                                  : (U32)ZSTD_hashPtrSalted(base + updateS=
tartIdx, hashLog + ZSTD_ROW_HASH_TAG_BITS, mls, ms->hashSalt);
         U32 const relRow =3D (hash >> ZSTD_ROW_HASH_TAG_BITS) << rowLog;
         U32* const row =3D hashTable + relRow;
-        BYTE* tagRow =3D (BYTE*)(tagTable + relRow);  /* Though tagTable i=
s laid out as a table of U16, each tag is only 1 byte.
-                                                       Explicit cast allow=
s us to get exact desired position within each row */
+        BYTE* tagRow =3D tagTable + relRow;
         U32 const pos =3D ZSTD_row_nextIndex(tagRow, rowMask);
=20
-        assert(hash =3D=3D ZSTD_hashPtr(base + updateStartIdx, hashLog + Z=
STD_ROW_HASH_TAG_BITS, mls));
-        ((BYTE*)tagRow)[pos + ZSTD_ROW_HASH_TAG_OFFSET] =3D hash & ZSTD_RO=
W_HASH_TAG_MASK;
+        assert(hash =3D=3D ZSTD_hashPtrSalted(base + updateStartIdx, hashL=
og + ZSTD_ROW_HASH_TAG_BITS, mls, ms->hashSalt));
+        tagRow[pos] =3D hash & ZSTD_ROW_HASH_TAG_MASK;
         row[pos] =3D updateStartIdx;
     }
 }
@@ -932,9 +912,11 @@ FORCE_INLINE_TEMPLATE void ZSTD_row_update_internalImp=
l(ZSTD_matchState_t* ms,
  * Inserts the byte at ip into the appropriate position in the hash table,=
 and updates ms->nextToUpdate.
  * Skips sections of long matches as is necessary.
  */
-FORCE_INLINE_TEMPLATE void ZSTD_row_update_internal(ZSTD_matchState_t* ms,=
 const BYTE* ip,
-                                                    U32 const mls, U32 con=
st rowLog,
-                                                    U32 const rowMask, U32=
 const useCache)
+FORCE_INLINE_TEMPLATE
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
+void ZSTD_row_update_internal(ZSTD_MatchState_t* ms, const BYTE* ip,
+                              U32 const mls, U32 const rowLog,
+                              U32 const rowMask, U32 const useCache)
 {
     U32 idx =3D ms->nextToUpdate;
     const BYTE* const base =3D ms->window.base;
@@ -965,13 +947,41 @@ FORCE_INLINE_TEMPLATE void ZSTD_row_update_internal(Z=
STD_matchState_t* ms, const
  * External wrapper for ZSTD_row_update_internal(). Used for filling the h=
ashtable during dictionary
  * processing.
  */
-void ZSTD_row_update(ZSTD_matchState_t* const ms, const BYTE* ip) {
+void ZSTD_row_update(ZSTD_MatchState_t* const ms, const BYTE* ip) {
     const U32 rowLog =3D BOUNDED(4, ms->cParams.searchLog, 6);
     const U32 rowMask =3D (1u << rowLog) - 1;
     const U32 mls =3D MIN(ms->cParams.minMatch, 6 /* mls caps out at 6 */);
=20
     DEBUGLOG(5, "ZSTD_row_update(), rowLog=3D%u", rowLog);
-    ZSTD_row_update_internal(ms, ip, mls, rowLog, rowMask, 0 /* dont use c=
ache */);
+    ZSTD_row_update_internal(ms, ip, mls, rowLog, rowMask, 0 /* don't use =
cache */);
+}
+
+/* Returns the mask width of bits group of which will be set to 1. Given n=
ot all
+ * architectures have easy movemask instruction, this helps to iterate over
+ * groups of bits easier and faster.
+ */
+FORCE_INLINE_TEMPLATE U32
+ZSTD_row_matchMaskGroupWidth(const U32 rowEntries)
+{
+    assert((rowEntries =3D=3D 16) || (rowEntries =3D=3D 32) || rowEntries =
=3D=3D 64);
+    assert(rowEntries <=3D ZSTD_ROW_HASH_MAX_ENTRIES);
+    (void)rowEntries;
+#if defined(ZSTD_ARCH_ARM_NEON)
+    /* NEON path only works for little endian */
+    if (!MEM_isLittleEndian()) {
+        return 1;
+    }
+    if (rowEntries =3D=3D 16) {
+        return 4;
+    }
+    if (rowEntries =3D=3D 32) {
+        return 2;
+    }
+    if (rowEntries =3D=3D 64) {
+        return 1;
+    }
+#endif
+    return 1;
 }
=20
 #if defined(ZSTD_ARCH_X86_SSE2)
@@ -994,71 +1004,82 @@ ZSTD_row_getSSEMask(int nbChunks, const BYTE* const =
src, const BYTE tag, const U
 }
 #endif
=20
-/* Returns a ZSTD_VecMask (U32) that has the nth bit set to 1 if the newly=
-computed "tag" matches
- * the hash at the nth position in a row of the tagTable.
- * Each row is a circular buffer beginning at the value of "head". So we m=
ust rotate the "matches" bitfield
- * to match up with the actual layout of the entries within the hashTable =
*/
+#if defined(ZSTD_ARCH_ARM_NEON)
+FORCE_INLINE_TEMPLATE ZSTD_VecMask
+ZSTD_row_getNEONMask(const U32 rowEntries, const BYTE* const src, const BY=
TE tag, const U32 headGrouped)
+{
+    assert((rowEntries =3D=3D 16) || (rowEntries =3D=3D 32) || rowEntries =
=3D=3D 64);
+    if (rowEntries =3D=3D 16) {
+        /* vshrn_n_u16 shifts by 4 every u16 and narrows to 8 lower bits.
+         * After that groups of 4 bits represent the equalMask. We lower
+         * all bits except the highest in these groups by doing AND with
+         * 0x88 =3D 0b10001000.
+         */
+        const uint8x16_t chunk =3D vld1q_u8(src);
+        const uint16x8_t equalMask =3D vreinterpretq_u16_u8(vceqq_u8(chunk=
, vdupq_n_u8(tag)));
+        const uint8x8_t res =3D vshrn_n_u16(equalMask, 4);
+        const U64 matches =3D vget_lane_u64(vreinterpret_u64_u8(res), 0);
+        return ZSTD_rotateRight_U64(matches, headGrouped) & 0x888888888888=
8888ull;
+    } else if (rowEntries =3D=3D 32) {
+        /* Same idea as with rowEntries =3D=3D 16 but doing AND with
+         * 0x55 =3D 0b01010101.
+         */
+        const uint16x8x2_t chunk =3D vld2q_u16((const uint16_t*)(const voi=
d*)src);
+        const uint8x16_t chunk0 =3D vreinterpretq_u8_u16(chunk.val[0]);
+        const uint8x16_t chunk1 =3D vreinterpretq_u8_u16(chunk.val[1]);
+        const uint8x16_t dup =3D vdupq_n_u8(tag);
+        const uint8x8_t t0 =3D vshrn_n_u16(vreinterpretq_u16_u8(vceqq_u8(c=
hunk0, dup)), 6);
+        const uint8x8_t t1 =3D vshrn_n_u16(vreinterpretq_u16_u8(vceqq_u8(c=
hunk1, dup)), 6);
+        const uint8x8_t res =3D vsli_n_u8(t0, t1, 4);
+        const U64 matches =3D vget_lane_u64(vreinterpret_u64_u8(res), 0) ;
+        return ZSTD_rotateRight_U64(matches, headGrouped) & 0x555555555555=
5555ull;
+    } else { /* rowEntries =3D=3D 64 */
+        const uint8x16x4_t chunk =3D vld4q_u8(src);
+        const uint8x16_t dup =3D vdupq_n_u8(tag);
+        const uint8x16_t cmp0 =3D vceqq_u8(chunk.val[0], dup);
+        const uint8x16_t cmp1 =3D vceqq_u8(chunk.val[1], dup);
+        const uint8x16_t cmp2 =3D vceqq_u8(chunk.val[2], dup);
+        const uint8x16_t cmp3 =3D vceqq_u8(chunk.val[3], dup);
+
+        const uint8x16_t t0 =3D vsriq_n_u8(cmp1, cmp0, 1);
+        const uint8x16_t t1 =3D vsriq_n_u8(cmp3, cmp2, 1);
+        const uint8x16_t t2 =3D vsriq_n_u8(t1, t0, 2);
+        const uint8x16_t t3 =3D vsriq_n_u8(t2, t2, 4);
+        const uint8x8_t t4 =3D vshrn_n_u16(vreinterpretq_u16_u8(t3), 4);
+        const U64 matches =3D vget_lane_u64(vreinterpret_u64_u8(t4), 0);
+        return ZSTD_rotateRight_U64(matches, headGrouped);
+    }
+}
+#endif
+
+/* Returns a ZSTD_VecMask (U64) that has the nth group (determined by
+ * ZSTD_row_matchMaskGroupWidth) of bits set to 1 if the newly-computed "t=
ag"
+ * matches the hash at the nth position in a row of the tagTable.
+ * Each row is a circular buffer beginning at the value of "headGrouped". =
So we
+ * must rotate the "matches" bitfield to match up with the actual layout o=
f the
+ * entries within the hashTable */
 FORCE_INLINE_TEMPLATE ZSTD_VecMask
-ZSTD_row_getMatchMask(const BYTE* const tagRow, const BYTE tag, const U32 =
head, const U32 rowEntries)
+ZSTD_row_getMatchMask(const BYTE* const tagRow, const BYTE tag, const U32 =
headGrouped, const U32 rowEntries)
 {
-    const BYTE* const src =3D tagRow + ZSTD_ROW_HASH_TAG_OFFSET;
+    const BYTE* const src =3D tagRow;
     assert((rowEntries =3D=3D 16) || (rowEntries =3D=3D 32) || rowEntries =
=3D=3D 64);
     assert(rowEntries <=3D ZSTD_ROW_HASH_MAX_ENTRIES);
+    assert(ZSTD_row_matchMaskGroupWidth(rowEntries) * rowEntries <=3D size=
of(ZSTD_VecMask) * 8);
=20
 #if defined(ZSTD_ARCH_X86_SSE2)
=20
-    return ZSTD_row_getSSEMask(rowEntries / 16, src, tag, head);
+    return ZSTD_row_getSSEMask(rowEntries / 16, src, tag, headGrouped);
=20
 #else /* SW or NEON-LE */
=20
 # if defined(ZSTD_ARCH_ARM_NEON)
   /* This NEON path only works for little endian - otherwise use SWAR belo=
w */
     if (MEM_isLittleEndian()) {
-        if (rowEntries =3D=3D 16) {
-            const uint8x16_t chunk =3D vld1q_u8(src);
-            const uint16x8_t equalMask =3D vreinterpretq_u16_u8(vceqq_u8(c=
hunk, vdupq_n_u8(tag)));
-            const uint16x8_t t0 =3D vshlq_n_u16(equalMask, 7);
-            const uint32x4_t t1 =3D vreinterpretq_u32_u16(vsriq_n_u16(t0, =
t0, 14));
-            const uint64x2_t t2 =3D vreinterpretq_u64_u32(vshrq_n_u32(t1, =
14));
-            const uint8x16_t t3 =3D vreinterpretq_u8_u64(vsraq_n_u64(t2, t=
2, 28));
-            const U16 hi =3D (U16)vgetq_lane_u8(t3, 8);
-            const U16 lo =3D (U16)vgetq_lane_u8(t3, 0);
-            return ZSTD_rotateRight_U16((hi << 8) | lo, head);
-        } else if (rowEntries =3D=3D 32) {
-            const uint16x8x2_t chunk =3D vld2q_u16((const U16*)(const void=
*)src);
-            const uint8x16_t chunk0 =3D vreinterpretq_u8_u16(chunk.val[0]);
-            const uint8x16_t chunk1 =3D vreinterpretq_u8_u16(chunk.val[1]);
-            const uint8x16_t equalMask0 =3D vceqq_u8(chunk0, vdupq_n_u8(ta=
g));
-            const uint8x16_t equalMask1 =3D vceqq_u8(chunk1, vdupq_n_u8(ta=
g));
-            const int8x8_t pack0 =3D vqmovn_s16(vreinterpretq_s16_u8(equal=
Mask0));
-            const int8x8_t pack1 =3D vqmovn_s16(vreinterpretq_s16_u8(equal=
Mask1));
-            const uint8x8_t t0 =3D vreinterpret_u8_s8(pack0);
-            const uint8x8_t t1 =3D vreinterpret_u8_s8(pack1);
-            const uint8x8_t t2 =3D vsri_n_u8(t1, t0, 2);
-            const uint8x8x2_t t3 =3D vuzp_u8(t2, t0);
-            const uint8x8_t t4 =3D vsri_n_u8(t3.val[1], t3.val[0], 4);
-            const U32 matches =3D vget_lane_u32(vreinterpret_u32_u8(t4), 0=
);
-            return ZSTD_rotateRight_U32(matches, head);
-        } else { /* rowEntries =3D=3D 64 */
-            const uint8x16x4_t chunk =3D vld4q_u8(src);
-            const uint8x16_t dup =3D vdupq_n_u8(tag);
-            const uint8x16_t cmp0 =3D vceqq_u8(chunk.val[0], dup);
-            const uint8x16_t cmp1 =3D vceqq_u8(chunk.val[1], dup);
-            const uint8x16_t cmp2 =3D vceqq_u8(chunk.val[2], dup);
-            const uint8x16_t cmp3 =3D vceqq_u8(chunk.val[3], dup);
-
-            const uint8x16_t t0 =3D vsriq_n_u8(cmp1, cmp0, 1);
-            const uint8x16_t t1 =3D vsriq_n_u8(cmp3, cmp2, 1);
-            const uint8x16_t t2 =3D vsriq_n_u8(t1, t0, 2);
-            const uint8x16_t t3 =3D vsriq_n_u8(t2, t2, 4);
-            const uint8x8_t t4 =3D vshrn_n_u16(vreinterpretq_u16_u8(t3), 4=
);
-            const U64 matches =3D vget_lane_u64(vreinterpret_u64_u8(t4), 0=
);
-            return ZSTD_rotateRight_U64(matches, head);
-        }
+        return ZSTD_row_getNEONMask(rowEntries, src, tag, headGrouped);
     }
 # endif /* ZSTD_ARCH_ARM_NEON */
     /* SWAR */
-    {   const size_t chunkSize =3D sizeof(size_t);
+    {   const int chunkSize =3D sizeof(size_t);
         const size_t shiftAmount =3D ((chunkSize * 8) - chunkSize);
         const size_t xFF =3D ~((size_t)0);
         const size_t x01 =3D xFF / 0xFF;
@@ -1091,11 +1112,11 @@ ZSTD_row_getMatchMask(const BYTE* const tagRow, con=
st BYTE tag, const U32 head,
         }
         matches =3D ~matches;
         if (rowEntries =3D=3D 16) {
-            return ZSTD_rotateRight_U16((U16)matches, head);
+            return ZSTD_rotateRight_U16((U16)matches, headGrouped);
         } else if (rowEntries =3D=3D 32) {
-            return ZSTD_rotateRight_U32((U32)matches, head);
+            return ZSTD_rotateRight_U32((U32)matches, headGrouped);
         } else {
-            return ZSTD_rotateRight_U64((U64)matches, head);
+            return ZSTD_rotateRight_U64((U64)matches, headGrouped);
         }
     }
 #endif
@@ -1103,29 +1124,30 @@ ZSTD_row_getMatchMask(const BYTE* const tagRow, con=
st BYTE tag, const U32 head,
=20
 /* The high-level approach of the SIMD row based match finder is as follow=
s:
  * - Figure out where to insert the new entry:
- *      - Generate a hash from a byte along with an additional 1-byte "sho=
rt hash". The additional byte is our "tag"
- *      - The hashTable is effectively split into groups or "rows" of 16 o=
r 32 entries of U32, and the hash determines
+ *      - Generate a hash for current input position and split it into a o=
ne byte of tag and `rowHashLog` bits of index.
+ *           - The hash is salted by a value that changes on every context=
 reset, so when the same table is used
+ *             we will avoid collisions that would otherwise slow us down =
by introducing phantom matches.
+ *      - The hashTable is effectively split into groups or "rows" of 15 o=
r 31 entries of U32, and the index determines
  *        which row to insert into.
- *      - Determine the correct position within the row to insert the entr=
y into. Each row of 16 or 32 can
- *        be considered as a circular buffer with a "head" index that resi=
des in the tagTable.
- *      - Also insert the "tag" into the equivalent row and position in th=
e tagTable.
- *          - Note: The tagTable has 17 or 33 1-byte entries per row, due =
to 16 or 32 tags, and 1 "head" entry.
- *                  The 17 or 33 entry rows are spaced out to occur every =
32 or 64 bytes, respectively,
- *                  for alignment/performance reasons, leaving some bytes =
unused.
- * - Use SIMD to efficiently compare the tags in the tagTable to the 1-byt=
e "short hash" and
+ *      - Determine the correct position within the row to insert the entr=
y into. Each row of 15 or 31 can
+ *        be considered as a circular buffer with a "head" index that resi=
des in the tagTable (overall 16 or 32 bytes
+ *        per row).
+ * - Use SIMD to efficiently compare the tags in the tagTable to the 1-byt=
e tag calculated for the position and
  *   generate a bitfield that we can cycle through to check the collisions=
 in the hash table.
  * - Pick the longest match.
+ * - Insert the tag into the equivalent row and position in the tagTable.
  */
 FORCE_INLINE_TEMPLATE
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
 size_t ZSTD_RowFindBestMatch(
-                        ZSTD_matchState_t* ms,
+                        ZSTD_MatchState_t* ms,
                         const BYTE* const ip, const BYTE* const iLimit,
                         size_t* offsetPtr,
                         const U32 mls, const ZSTD_dictMode_e dictMode,
                         const U32 rowLog)
 {
     U32* const hashTable =3D ms->hashTable;
-    U16* const tagTable =3D ms->tagTable;
+    BYTE* const tagTable =3D ms->tagTable;
     U32* const hashCache =3D ms->hashCache;
     const U32 hashLog =3D ms->rowHashLog;
     const ZSTD_compressionParameters* const cParams =3D &ms->cParams;
@@ -1143,11 +1165,14 @@ size_t ZSTD_RowFindBestMatch(
     const U32 rowEntries =3D (1U << rowLog);
     const U32 rowMask =3D rowEntries - 1;
     const U32 cappedSearchLog =3D MIN(cParams->searchLog, rowLog); /* nb o=
f searches is capped at nb entries per row */
+    const U32 groupWidth =3D ZSTD_row_matchMaskGroupWidth(rowEntries);
+    const U64 hashSalt =3D ms->hashSalt;
     U32 nbAttempts =3D 1U << cappedSearchLog;
     size_t ml=3D4-1;
+    U32 hash;
=20
     /* DMS/DDS variables that may be referenced laster */
-    const ZSTD_matchState_t* const dms =3D ms->dictMatchState;
+    const ZSTD_MatchState_t* const dms =3D ms->dictMatchState;
=20
     /* Initialize the following variables to satisfy static analyzer */
     size_t ddsIdx =3D 0;
@@ -1168,7 +1193,7 @@ size_t ZSTD_RowFindBestMatch(
     if (dictMode =3D=3D ZSTD_dictMatchState) {
         /* Prefetch DMS rows */
         U32* const dmsHashTable =3D dms->hashTable;
-        U16* const dmsTagTable =3D dms->tagTable;
+        BYTE* const dmsTagTable =3D dms->tagTable;
         U32 const dmsHash =3D (U32)ZSTD_hashPtr(ip, dms->rowHashLog + ZSTD=
_ROW_HASH_TAG_BITS, mls);
         U32 const dmsRelRow =3D (dmsHash >> ZSTD_ROW_HASH_TAG_BITS) << row=
Log;
         dmsTag =3D dmsHash & ZSTD_ROW_HASH_TAG_MASK;
@@ -1178,23 +1203,34 @@ size_t ZSTD_RowFindBestMatch(
     }
=20
     /* Update the hashTable and tagTable up to (but not including) ip */
-    ZSTD_row_update_internal(ms, ip, mls, rowLog, rowMask, 1 /* useCache *=
/);
+    if (!ms->lazySkipping) {
+        ZSTD_row_update_internal(ms, ip, mls, rowLog, rowMask, 1 /* useCac=
he */);
+        hash =3D ZSTD_row_nextCachedHash(hashCache, hashTable, tagTable, b=
ase, curr, hashLog, rowLog, mls, hashSalt);
+    } else {
+        /* Stop inserting every position when in the lazy skipping mode.
+         * The hash cache is also not kept up to date in this mode.
+         */
+        hash =3D (U32)ZSTD_hashPtrSalted(ip, hashLog + ZSTD_ROW_HASH_TAG_B=
ITS, mls, hashSalt);
+        ms->nextToUpdate =3D curr;
+    }
+    ms->hashSaltEntropy +=3D hash; /* collect salt entropy */
+
     {   /* Get the hash for ip, compute the appropriate row */
-        U32 const hash =3D ZSTD_row_nextCachedHash(hashCache, hashTable, t=
agTable, base, curr, hashLog, rowLog, mls);
         U32 const relRow =3D (hash >> ZSTD_ROW_HASH_TAG_BITS) << rowLog;
         U32 const tag =3D hash & ZSTD_ROW_HASH_TAG_MASK;
         U32* const row =3D hashTable + relRow;
         BYTE* tagRow =3D (BYTE*)(tagTable + relRow);
-        U32 const head =3D *tagRow & rowMask;
+        U32 const headGrouped =3D (*tagRow & rowMask) * groupWidth;
         U32 matchBuffer[ZSTD_ROW_HASH_MAX_ENTRIES];
         size_t numMatches =3D 0;
         size_t currMatch =3D 0;
-        ZSTD_VecMask matches =3D ZSTD_row_getMatchMask(tagRow, (BYTE)tag, =
head, rowEntries);
+        ZSTD_VecMask matches =3D ZSTD_row_getMatchMask(tagRow, (BYTE)tag, =
headGrouped, rowEntries);
=20
         /* Cycle through the matches and prefetch */
-        for (; (matches > 0) && (nbAttempts > 0); --nbAttempts, matches &=
=3D (matches - 1)) {
-            U32 const matchPos =3D (head + ZSTD_VecMask_next(matches)) & r=
owMask;
+        for (; (matches > 0) && (nbAttempts > 0); matches &=3D (matches - =
1)) {
+            U32 const matchPos =3D ((headGrouped + ZSTD_VecMask_next(match=
es)) / groupWidth) & rowMask;
             U32 const matchIndex =3D row[matchPos];
+            if(matchPos =3D=3D 0) continue;
             assert(numMatches < rowEntries);
             if (matchIndex < lowLimit)
                 break;
@@ -1204,13 +1240,14 @@ size_t ZSTD_RowFindBestMatch(
                 PREFETCH_L1(dictBase + matchIndex);
             }
             matchBuffer[numMatches++] =3D matchIndex;
+            --nbAttempts;
         }
=20
         /* Speed opt: insert current byte into hashtable too. This allows =
us to avoid one iteration of the loop
            in ZSTD_row_update_internal() at the next search. */
         {
             U32 const pos =3D ZSTD_row_nextIndex(tagRow, rowMask);
-            tagRow[pos + ZSTD_ROW_HASH_TAG_OFFSET] =3D (BYTE)tag;
+            tagRow[pos] =3D (BYTE)tag;
             row[pos] =3D ms->nextToUpdate++;
         }
=20
@@ -1224,7 +1261,8 @@ size_t ZSTD_RowFindBestMatch(
             if ((dictMode !=3D ZSTD_extDict) || matchIndex >=3D dictLimit)=
 {
                 const BYTE* const match =3D base + matchIndex;
                 assert(matchIndex >=3D dictLimit);   /* ensures this is tr=
ue if dictMode !=3D ZSTD_extDict */
-                if (match[ml] =3D=3D ip[ml])   /* potentially better */
+                /* read 4B starting from (match + ml + 1 - sizeof(U32)) */
+                if (MEM_read32(match + ml - 3) =3D=3D MEM_read32(ip + ml -=
 3))   /* potentially better */
                     currentMl =3D ZSTD_count(ip, match, iLimit);
             } else {
                 const BYTE* const match =3D dictBase + matchIndex;
@@ -1236,7 +1274,7 @@ size_t ZSTD_RowFindBestMatch(
             /* Save best solution */
             if (currentMl > ml) {
                 ml =3D currentMl;
-                *offsetPtr =3D STORE_OFFSET(curr - matchIndex);
+                *offsetPtr =3D OFFSET_TO_OFFBASE(curr - matchIndex);
                 if (ip+currentMl =3D=3D iLimit) break; /* best possible, a=
voids read overflow on next attempt */
             }
         }
@@ -1254,19 +1292,21 @@ size_t ZSTD_RowFindBestMatch(
         const U32 dmsSize              =3D (U32)(dmsEnd - dmsBase);
         const U32 dmsIndexDelta        =3D dictLimit - dmsSize;
=20
-        {   U32 const head =3D *dmsTagRow & rowMask;
+        {   U32 const headGrouped =3D (*dmsTagRow & rowMask) * groupWidth;
             U32 matchBuffer[ZSTD_ROW_HASH_MAX_ENTRIES];
             size_t numMatches =3D 0;
             size_t currMatch =3D 0;
-            ZSTD_VecMask matches =3D ZSTD_row_getMatchMask(dmsTagRow, (BYT=
E)dmsTag, head, rowEntries);
+            ZSTD_VecMask matches =3D ZSTD_row_getMatchMask(dmsTagRow, (BYT=
E)dmsTag, headGrouped, rowEntries);
=20
-            for (; (matches > 0) && (nbAttempts > 0); --nbAttempts, matche=
s &=3D (matches - 1)) {
-                U32 const matchPos =3D (head + ZSTD_VecMask_next(matches))=
 & rowMask;
+            for (; (matches > 0) && (nbAttempts > 0); matches &=3D (matche=
s - 1)) {
+                U32 const matchPos =3D ((headGrouped + ZSTD_VecMask_next(m=
atches)) / groupWidth) & rowMask;
                 U32 const matchIndex =3D dmsRow[matchPos];
+                if(matchPos =3D=3D 0) continue;
                 if (matchIndex < dmsLowestIndex)
                     break;
                 PREFETCH_L1(dmsBase + matchIndex);
                 matchBuffer[numMatches++] =3D matchIndex;
+                --nbAttempts;
             }
=20
             /* Return the longest match */
@@ -1285,7 +1325,7 @@ size_t ZSTD_RowFindBestMatch(
                 if (currentMl > ml) {
                     ml =3D currentMl;
                     assert(curr > matchIndex + dmsIndexDelta);
-                    *offsetPtr =3D STORE_OFFSET(curr - (matchIndex + dmsIn=
dexDelta));
+                    *offsetPtr =3D OFFSET_TO_OFFBASE(curr - (matchIndex + =
dmsIndexDelta));
                     if (ip+currentMl =3D=3D iLimit) break;
                 }
             }
@@ -1301,7 +1341,7 @@ size_t ZSTD_RowFindBestMatch(
  * ZSTD_searchMax() dispatches to the correct implementation function.
  *
  * TODO: The start of the search function involves loading and calculating=
 a
- * bunch of constants from the ZSTD_matchState_t. These computations could=
 be
+ * bunch of constants from the ZSTD_MatchState_t. These computations could=
 be
  * done in an initialization function, and saved somewhere in the match st=
ate.
  * Then we could pass a pointer to the saved state instead of the match st=
ate,
  * and avoid duplicate computations.
@@ -1325,7 +1365,7 @@ size_t ZSTD_RowFindBestMatch(
=20
 #define GEN_ZSTD_BT_SEARCH_FN(dictMode, mls)                              =
             \
     ZSTD_SEARCH_FN_ATTRS size_t ZSTD_BT_SEARCH_FN(dictMode, mls)(         =
             \
-            ZSTD_matchState_t* ms,                                        =
             \
+            ZSTD_MatchState_t* ms,                                        =
             \
             const BYTE* ip, const BYTE* const iLimit,                     =
             \
             size_t* offBasePtr)                                           =
             \
     {                                                                     =
             \
@@ -1335,7 +1375,7 @@ size_t ZSTD_RowFindBestMatch(
=20
 #define GEN_ZSTD_HC_SEARCH_FN(dictMode, mls)                              =
            \
     ZSTD_SEARCH_FN_ATTRS size_t ZSTD_HC_SEARCH_FN(dictMode, mls)(         =
            \
-            ZSTD_matchState_t* ms,                                        =
            \
+            ZSTD_MatchState_t* ms,                                        =
            \
             const BYTE* ip, const BYTE* const iLimit,                     =
            \
             size_t* offsetPtr)                                            =
            \
     {                                                                     =
            \
@@ -1345,7 +1385,7 @@ size_t ZSTD_RowFindBestMatch(
=20
 #define GEN_ZSTD_ROW_SEARCH_FN(dictMode, mls, rowLog)                     =
                     \
     ZSTD_SEARCH_FN_ATTRS size_t ZSTD_ROW_SEARCH_FN(dictMode, mls, rowLog)(=
                     \
-            ZSTD_matchState_t* ms,                                        =
                     \
+            ZSTD_MatchState_t* ms,                                        =
                     \
             const BYTE* ip, const BYTE* const iLimit,                     =
                     \
             size_t* offsetPtr)                                            =
                     \
     {                                                                     =
                     \
@@ -1446,7 +1486,7 @@ typedef enum { search_hashChain=3D0, search_binaryTre=
e=3D1, search_rowHash=3D2 } searc
  * If a match is found its offset is stored in @p offsetPtr.
  */
 FORCE_INLINE_TEMPLATE size_t ZSTD_searchMax(
-    ZSTD_matchState_t* ms,
+    ZSTD_MatchState_t* ms,
     const BYTE* ip,
     const BYTE* iend,
     size_t* offsetPtr,
@@ -1472,9 +1512,10 @@ FORCE_INLINE_TEMPLATE size_t ZSTD_searchMax(
 *  Common parser - lazy strategy
 *********************************/
=20
-FORCE_INLINE_TEMPLATE size_t
-ZSTD_compressBlock_lazy_generic(
-                        ZSTD_matchState_t* ms, seqStore_t* seqStore,
+FORCE_INLINE_TEMPLATE
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
+size_t ZSTD_compressBlock_lazy_generic(
+                        ZSTD_MatchState_t* ms, SeqStore_t* seqStore,
                         U32 rep[ZSTD_REP_NUM],
                         const void* src, size_t srcSize,
                         const searchMethod_e searchMethod, const U32 depth,
@@ -1491,12 +1532,13 @@ ZSTD_compressBlock_lazy_generic(
     const U32 mls =3D BOUNDED(4, ms->cParams.minMatch, 6);
     const U32 rowLog =3D BOUNDED(4, ms->cParams.searchLog, 6);
=20
-    U32 offset_1 =3D rep[0], offset_2 =3D rep[1], savedOffset=3D0;
+    U32 offset_1 =3D rep[0], offset_2 =3D rep[1];
+    U32 offsetSaved1 =3D 0, offsetSaved2 =3D 0;
=20
     const int isDMS =3D dictMode =3D=3D ZSTD_dictMatchState;
     const int isDDS =3D dictMode =3D=3D ZSTD_dedicatedDictSearch;
     const int isDxS =3D isDMS || isDDS;
-    const ZSTD_matchState_t* const dms =3D ms->dictMatchState;
+    const ZSTD_MatchState_t* const dms =3D ms->dictMatchState;
     const U32 dictLowestIndex      =3D isDxS ? dms->window.dictLimit : 0;
     const BYTE* const dictBase     =3D isDxS ? dms->window.base : NULL;
     const BYTE* const dictLowest   =3D isDxS ? dictBase + dictLowestIndex =
: NULL;
@@ -1512,8 +1554,8 @@ ZSTD_compressBlock_lazy_generic(
         U32 const curr =3D (U32)(ip - base);
         U32 const windowLow =3D ZSTD_getLowestPrefixIndex(ms, curr, ms->cP=
arams.windowLog);
         U32 const maxRep =3D curr - windowLow;
-        if (offset_2 > maxRep) savedOffset =3D offset_2, offset_2 =3D 0;
-        if (offset_1 > maxRep) savedOffset =3D offset_1, offset_1 =3D 0;
+        if (offset_2 > maxRep) offsetSaved2 =3D offset_2, offset_2 =3D 0;
+        if (offset_1 > maxRep) offsetSaved1 =3D offset_1, offset_1 =3D 0;
     }
     if (isDxS) {
         /* dictMatchState repCode checks don't currently handle repCode =
=3D=3D 0
@@ -1522,10 +1564,11 @@ ZSTD_compressBlock_lazy_generic(
         assert(offset_2 <=3D dictAndPrefixLength);
     }
=20
+    /* Reset the lazy skipping state */
+    ms->lazySkipping =3D 0;
+
     if (searchMethod =3D=3D search_rowHash) {
-        ZSTD_row_fillHashCache(ms, base, rowLog,
-                            MIN(ms->cParams.minMatch, 6 /* mls caps out at=
 6 */),
-                            ms->nextToUpdate, ilimit);
+        ZSTD_row_fillHashCache(ms, base, rowLog, mls, ms->nextToUpdate, il=
imit);
     }
=20
     /* Match Loop */
@@ -1537,7 +1580,7 @@ ZSTD_compressBlock_lazy_generic(
 #endif
     while (ip < ilimit) {
         size_t matchLength=3D0;
-        size_t offcode=3DSTORE_REPCODE_1;
+        size_t offBase =3D REPCODE1_TO_OFFBASE;
         const BYTE* start=3Dip+1;
         DEBUGLOG(7, "search baseline (depth 0)");
=20
@@ -1548,7 +1591,7 @@ ZSTD_compressBlock_lazy_generic(
                                 && repIndex < prefixLowestIndex) ?
                                    dictBase + (repIndex - dictIndexDelta) :
                                    base + repIndex;
-            if (((U32)((prefixLowestIndex-1) - repIndex) >=3D 3 /* intenti=
onal underflow */)
+            if ((ZSTD_index_overlap_check(prefixLowestIndex, repIndex))
                 && (MEM_read32(repMatch) =3D=3D MEM_read32(ip+1)) ) {
                 const BYTE* repMatchEnd =3D repIndex < prefixLowestIndex ?=
 dictEnd : iend;
                 matchLength =3D ZSTD_count_2segments(ip+1+4, repMatch+4, i=
end, repMatchEnd, prefixLowest) + 4;
@@ -1562,14 +1605,23 @@ ZSTD_compressBlock_lazy_generic(
         }
=20
         /* first search (depth 0) */
-        {   size_t offsetFound =3D 999999999;
-            size_t const ml2 =3D ZSTD_searchMax(ms, ip, iend, &offsetFound=
, mls, rowLog, searchMethod, dictMode);
+        {   size_t offbaseFound =3D 999999999;
+            size_t const ml2 =3D ZSTD_searchMax(ms, ip, iend, &offbaseFoun=
d, mls, rowLog, searchMethod, dictMode);
             if (ml2 > matchLength)
-                matchLength =3D ml2, start =3D ip, offcode=3DoffsetFound;
+                matchLength =3D ml2, start =3D ip, offBase =3D offbaseFoun=
d;
         }
=20
         if (matchLength < 4) {
-            ip +=3D ((ip-anchor) >> kSearchStrength) + 1;   /* jump faster=
 over incompressible sections */
+            size_t const step =3D ((size_t)(ip-anchor) >> kSearchStrength)=
 + 1;   /* jump faster over incompressible sections */;
+            ip +=3D step;
+            /* Enter the lazy skipping mode once we are skipping more than=
 8 bytes at a time.
+             * In this mode we stop inserting every position into our tabl=
es, and only insert
+             * positions that we search, which is one in step positions.
+             * The exact cutoff is flexible, I've just chosen a number tha=
t is reasonably high,
+             * so we minimize the compression ratio loss in "normal" scena=
rios. This mode gets
+             * triggered once we've gone 2KB without finding any matches.
+             */
+            ms->lazySkipping =3D step > kLazySkippingStep;
             continue;
         }
=20
@@ -1579,34 +1631,34 @@ ZSTD_compressBlock_lazy_generic(
             DEBUGLOG(7, "search depth 1");
             ip ++;
             if ( (dictMode =3D=3D ZSTD_noDict)
-              && (offcode) && ((offset_1>0) & (MEM_read32(ip) =3D=3D MEM_r=
ead32(ip - offset_1)))) {
+              && (offBase) && ((offset_1>0) & (MEM_read32(ip) =3D=3D MEM_r=
ead32(ip - offset_1)))) {
                 size_t const mlRep =3D ZSTD_count(ip+4, ip+4-offset_1, ien=
d) + 4;
                 int const gain2 =3D (int)(mlRep * 3);
-                int const gain1 =3D (int)(matchLength*3 - ZSTD_highbit32((=
U32)STORED_TO_OFFBASE(offcode)) + 1);
+                int const gain1 =3D (int)(matchLength*3 - ZSTD_highbit32((=
U32)offBase) + 1);
                 if ((mlRep >=3D 4) && (gain2 > gain1))
-                    matchLength =3D mlRep, offcode =3D STORE_REPCODE_1, st=
art =3D ip;
+                    matchLength =3D mlRep, offBase =3D REPCODE1_TO_OFFBASE=
, start =3D ip;
             }
             if (isDxS) {
                 const U32 repIndex =3D (U32)(ip - base) - offset_1;
                 const BYTE* repMatch =3D repIndex < prefixLowestIndex ?
                                dictBase + (repIndex - dictIndexDelta) :
                                base + repIndex;
-                if (((U32)((prefixLowestIndex-1) - repIndex) >=3D 3 /* int=
entional underflow */)
+                if ((ZSTD_index_overlap_check(prefixLowestIndex, repIndex))
                     && (MEM_read32(repMatch) =3D=3D MEM_read32(ip)) ) {
                     const BYTE* repMatchEnd =3D repIndex < prefixLowestInd=
ex ? dictEnd : iend;
                     size_t const mlRep =3D ZSTD_count_2segments(ip+4, repM=
atch+4, iend, repMatchEnd, prefixLowest) + 4;
                     int const gain2 =3D (int)(mlRep * 3);
-                    int const gain1 =3D (int)(matchLength*3 - ZSTD_highbit=
32((U32)STORED_TO_OFFBASE(offcode)) + 1);
+                    int const gain1 =3D (int)(matchLength*3 - ZSTD_highbit=
32((U32)offBase) + 1);
                     if ((mlRep >=3D 4) && (gain2 > gain1))
-                        matchLength =3D mlRep, offcode =3D STORE_REPCODE_1=
, start =3D ip;
+                        matchLength =3D mlRep, offBase =3D REPCODE1_TO_OFF=
BASE, start =3D ip;
                 }
             }
-            {   size_t offset2=3D999999999;
-                size_t const ml2 =3D ZSTD_searchMax(ms, ip, iend, &offset2=
, mls, rowLog, searchMethod, dictMode);
-                int const gain2 =3D (int)(ml2*4 - ZSTD_highbit32((U32)STOR=
ED_TO_OFFBASE(offset2)));   /* raw approx */
-                int const gain1 =3D (int)(matchLength*4 - ZSTD_highbit32((=
U32)STORED_TO_OFFBASE(offcode)) + 4);
+            {   size_t ofbCandidate=3D999999999;
+                size_t const ml2 =3D ZSTD_searchMax(ms, ip, iend, &ofbCand=
idate, mls, rowLog, searchMethod, dictMode);
+                int const gain2 =3D (int)(ml2*4 - ZSTD_highbit32((U32)ofbC=
andidate));   /* raw approx */
+                int const gain1 =3D (int)(matchLength*4 - ZSTD_highbit32((=
U32)offBase) + 4);
                 if ((ml2 >=3D 4) && (gain2 > gain1)) {
-                    matchLength =3D ml2, offcode =3D offset2, start =3D ip;
+                    matchLength =3D ml2, offBase =3D ofbCandidate, start =
=3D ip;
                     continue;   /* search a better one */
             }   }
=20
@@ -1615,34 +1667,34 @@ ZSTD_compressBlock_lazy_generic(
                 DEBUGLOG(7, "search depth 2");
                 ip ++;
                 if ( (dictMode =3D=3D ZSTD_noDict)
-                  && (offcode) && ((offset_1>0) & (MEM_read32(ip) =3D=3D M=
EM_read32(ip - offset_1)))) {
+                  && (offBase) && ((offset_1>0) & (MEM_read32(ip) =3D=3D M=
EM_read32(ip - offset_1)))) {
                     size_t const mlRep =3D ZSTD_count(ip+4, ip+4-offset_1,=
 iend) + 4;
                     int const gain2 =3D (int)(mlRep * 4);
-                    int const gain1 =3D (int)(matchLength*4 - ZSTD_highbit=
32((U32)STORED_TO_OFFBASE(offcode)) + 1);
+                    int const gain1 =3D (int)(matchLength*4 - ZSTD_highbit=
32((U32)offBase) + 1);
                     if ((mlRep >=3D 4) && (gain2 > gain1))
-                        matchLength =3D mlRep, offcode =3D STORE_REPCODE_1=
, start =3D ip;
+                        matchLength =3D mlRep, offBase =3D REPCODE1_TO_OFF=
BASE, start =3D ip;
                 }
                 if (isDxS) {
                     const U32 repIndex =3D (U32)(ip - base) - offset_1;
                     const BYTE* repMatch =3D repIndex < prefixLowestIndex ?
                                    dictBase + (repIndex - dictIndexDelta) :
                                    base + repIndex;
-                    if (((U32)((prefixLowestIndex-1) - repIndex) >=3D 3 /*=
 intentional underflow */)
+                    if ((ZSTD_index_overlap_check(prefixLowestIndex, repIn=
dex))
                         && (MEM_read32(repMatch) =3D=3D MEM_read32(ip)) ) {
                         const BYTE* repMatchEnd =3D repIndex < prefixLowes=
tIndex ? dictEnd : iend;
                         size_t const mlRep =3D ZSTD_count_2segments(ip+4, =
repMatch+4, iend, repMatchEnd, prefixLowest) + 4;
                         int const gain2 =3D (int)(mlRep * 4);
-                        int const gain1 =3D (int)(matchLength*4 - ZSTD_hig=
hbit32((U32)STORED_TO_OFFBASE(offcode)) + 1);
+                        int const gain1 =3D (int)(matchLength*4 - ZSTD_hig=
hbit32((U32)offBase) + 1);
                         if ((mlRep >=3D 4) && (gain2 > gain1))
-                            matchLength =3D mlRep, offcode =3D STORE_REPCO=
DE_1, start =3D ip;
+                            matchLength =3D mlRep, offBase =3D REPCODE1_TO=
_OFFBASE, start =3D ip;
                     }
                 }
-                {   size_t offset2=3D999999999;
-                    size_t const ml2 =3D ZSTD_searchMax(ms, ip, iend, &off=
set2, mls, rowLog, searchMethod, dictMode);
-                    int const gain2 =3D (int)(ml2*4 - ZSTD_highbit32((U32)=
STORED_TO_OFFBASE(offset2)));   /* raw approx */
-                    int const gain1 =3D (int)(matchLength*4 - ZSTD_highbit=
32((U32)STORED_TO_OFFBASE(offcode)) + 7);
+                {   size_t ofbCandidate=3D999999999;
+                    size_t const ml2 =3D ZSTD_searchMax(ms, ip, iend, &ofb=
Candidate, mls, rowLog, searchMethod, dictMode);
+                    int const gain2 =3D (int)(ml2*4 - ZSTD_highbit32((U32)=
ofbCandidate));   /* raw approx */
+                    int const gain1 =3D (int)(matchLength*4 - ZSTD_highbit=
32((U32)offBase) + 7);
                     if ((ml2 >=3D 4) && (gain2 > gain1)) {
-                        matchLength =3D ml2, offcode =3D offset2, start =
=3D ip;
+                        matchLength =3D ml2, offBase =3D ofbCandidate, sta=
rt =3D ip;
                         continue;
             }   }   }
             break;  /* nothing found : store previous solution */
@@ -1653,26 +1705,33 @@ ZSTD_compressBlock_lazy_generic(
          * notably if `value` is unsigned, resulting in a large positive `=
-value`.
          */
         /* catch up */
-        if (STORED_IS_OFFSET(offcode)) {
+        if (OFFBASE_IS_OFFSET(offBase)) {
             if (dictMode =3D=3D ZSTD_noDict) {
-                while ( ((start > anchor) & (start - STORED_OFFSET(offcode=
) > prefixLowest))
-                     && (start[-1] =3D=3D (start-STORED_OFFSET(offcode))[-=
1]) )  /* only search for offset within prefix */
+                while ( ((start > anchor) & (start - OFFBASE_TO_OFFSET(off=
Base) > prefixLowest))
+                     && (start[-1] =3D=3D (start-OFFBASE_TO_OFFSET(offBase=
))[-1]) )  /* only search for offset within prefix */
                     { start--; matchLength++; }
             }
             if (isDxS) {
-                U32 const matchIndex =3D (U32)((size_t)(start-base) - STOR=
ED_OFFSET(offcode));
+                U32 const matchIndex =3D (U32)((size_t)(start-base) - OFFB=
ASE_TO_OFFSET(offBase));
                 const BYTE* match =3D (matchIndex < prefixLowestIndex) ? d=
ictBase + matchIndex - dictIndexDelta : base + matchIndex;
                 const BYTE* const mStart =3D (matchIndex < prefixLowestInd=
ex) ? dictLowest : prefixLowest;
                 while ((start>anchor) && (match>mStart) && (start[-1] =3D=
=3D match[-1])) { start--; match--; matchLength++; }  /* catch up */
             }
-            offset_2 =3D offset_1; offset_1 =3D (U32)STORED_OFFSET(offcode=
);
+            offset_2 =3D offset_1; offset_1 =3D (U32)OFFBASE_TO_OFFSET(off=
Base);
         }
         /* store sequence */
 _storeSequence:
         {   size_t const litLength =3D (size_t)(start - anchor);
-            ZSTD_storeSeq(seqStore, litLength, anchor, iend, (U32)offcode,=
 matchLength);
+            ZSTD_storeSeq(seqStore, litLength, anchor, iend, (U32)offBase,=
 matchLength);
             anchor =3D ip =3D start + matchLength;
         }
+        if (ms->lazySkipping) {
+            /* We've found a match, disable lazy skipping mode, and refill=
 the hash cache. */
+            if (searchMethod =3D=3D search_rowHash) {
+                ZSTD_row_fillHashCache(ms, base, rowLog, mls, ms->nextToUp=
date, ilimit);
+            }
+            ms->lazySkipping =3D 0;
+        }
=20
         /* check immediate repcode */
         if (isDxS) {
@@ -1682,12 +1741,12 @@ ZSTD_compressBlock_lazy_generic(
                 const BYTE* repMatch =3D repIndex < prefixLowestIndex ?
                         dictBase - dictIndexDelta + repIndex :
                         base + repIndex;
-                if ( ((U32)((prefixLowestIndex-1) - (U32)repIndex) >=3D 3 =
/* intentional overflow */)
+                if ( (ZSTD_index_overlap_check(prefixLowestIndex, repIndex=
))
                    && (MEM_read32(repMatch) =3D=3D MEM_read32(ip)) ) {
                     const BYTE* const repEnd2 =3D repIndex < prefixLowestI=
ndex ? dictEnd : iend;
                     matchLength =3D ZSTD_count_2segments(ip+4, repMatch+4,=
 iend, repEnd2, prefixLowest) + 4;
-                    offcode =3D offset_2; offset_2 =3D offset_1; offset_1 =
=3D (U32)offcode;   /* swap offset_2 <=3D> offset_1 */
-                    ZSTD_storeSeq(seqStore, 0, anchor, iend, STORE_REPCODE=
_1, matchLength);
+                    offBase =3D offset_2; offset_2 =3D offset_1; offset_1 =
=3D (U32)offBase;   /* swap offset_2 <=3D> offset_1 */
+                    ZSTD_storeSeq(seqStore, 0, anchor, iend, REPCODE1_TO_O=
FFBASE, matchLength);
                     ip +=3D matchLength;
                     anchor =3D ip;
                     continue;
@@ -1701,168 +1760,183 @@ ZSTD_compressBlock_lazy_generic(
                  && (MEM_read32(ip) =3D=3D MEM_read32(ip - offset_2)) ) {
                 /* store sequence */
                 matchLength =3D ZSTD_count(ip+4, ip+4-offset_2, iend) + 4;
-                offcode =3D offset_2; offset_2 =3D offset_1; offset_1 =3D =
(U32)offcode; /* swap repcodes */
-                ZSTD_storeSeq(seqStore, 0, anchor, iend, STORE_REPCODE_1, =
matchLength);
+                offBase =3D offset_2; offset_2 =3D offset_1; offset_1 =3D =
(U32)offBase; /* swap repcodes */
+                ZSTD_storeSeq(seqStore, 0, anchor, iend, REPCODE1_TO_OFFBA=
SE, matchLength);
                 ip +=3D matchLength;
                 anchor =3D ip;
                 continue;   /* faster when present ... (?) */
     }   }   }
=20
-    /* Save reps for next block */
-    rep[0] =3D offset_1 ? offset_1 : savedOffset;
-    rep[1] =3D offset_2 ? offset_2 : savedOffset;
+    /* If offset_1 started invalid (offsetSaved1 !=3D 0) and became valid =
(offset_1 !=3D 0),
+     * rotate saved offsets. See comment in ZSTD_compressBlock_fast_noDict=
 for more context. */
+    offsetSaved2 =3D ((offsetSaved1 !=3D 0) && (offset_1 !=3D 0)) ? offset=
Saved1 : offsetSaved2;
+
+    /* save reps for next block */
+    rep[0] =3D offset_1 ? offset_1 : offsetSaved1;
+    rep[1] =3D offset_2 ? offset_2 : offsetSaved2;
=20
     /* Return the last literals size */
     return (size_t)(iend - anchor);
 }
+#endif /* build exclusions */
=20
=20
-size_t ZSTD_compressBlock_btlazy2(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+#ifndef ZSTD_EXCLUDE_GREEDY_BLOCK_COMPRESSOR
+size_t ZSTD_compressBlock_greedy(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
 {
-    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_binaryTree, 2, ZSTD_noDict);
+    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_hashChain, 0, ZSTD_noDict);
 }
=20
-size_t ZSTD_compressBlock_lazy2(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_greedy_dictMatchState(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
 {
-    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_hashChain, 2, ZSTD_noDict);
+    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_hashChain, 0, ZSTD_dictMatchState);
 }
=20
-size_t ZSTD_compressBlock_lazy(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_greedy_dedicatedDictSearch(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
 {
-    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_hashChain, 1, ZSTD_noDict);
+    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_hashChain, 0, ZSTD_dedicatedDictSearch);
 }
=20
-size_t ZSTD_compressBlock_greedy(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_greedy_row(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
 {
-    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_hashChain, 0, ZSTD_noDict);
+    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_rowHash, 0, ZSTD_noDict);
 }
=20
-size_t ZSTD_compressBlock_btlazy2_dictMatchState(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_greedy_dictMatchState_row(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
 {
-    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_binaryTree, 2, ZSTD_dictMatchState);
+    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_rowHash, 0, ZSTD_dictMatchState);
 }
=20
-size_t ZSTD_compressBlock_lazy2_dictMatchState(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_greedy_dedicatedDictSearch_row(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
 {
-    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_hashChain, 2, ZSTD_dictMatchState);
+    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_rowHash, 0, ZSTD_dedicatedDictSearch);
 }
+#endif
=20
-size_t ZSTD_compressBlock_lazy_dictMatchState(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+#ifndef ZSTD_EXCLUDE_LAZY_BLOCK_COMPRESSOR
+size_t ZSTD_compressBlock_lazy(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
 {
-    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_hashChain, 1, ZSTD_dictMatchState);
+    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_hashChain, 1, ZSTD_noDict);
 }
=20
-size_t ZSTD_compressBlock_greedy_dictMatchState(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_lazy_dictMatchState(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
 {
-    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_hashChain, 0, ZSTD_dictMatchState);
+    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_hashChain, 1, ZSTD_dictMatchState);
 }
=20
-
-size_t ZSTD_compressBlock_lazy2_dedicatedDictSearch(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_lazy_dedicatedDictSearch(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
 {
-    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_hashChain, 2, ZSTD_dedicatedDictSearch);
+    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_hashChain, 1, ZSTD_dedicatedDictSearch);
 }
=20
-size_t ZSTD_compressBlock_lazy_dedicatedDictSearch(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_lazy_row(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
 {
-    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_hashChain, 1, ZSTD_dedicatedDictSearch);
+    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_rowHash, 1, ZSTD_noDict);
 }
=20
-size_t ZSTD_compressBlock_greedy_dedicatedDictSearch(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_lazy_dictMatchState_row(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
 {
-    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_hashChain, 0, ZSTD_dedicatedDictSearch);
+    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_rowHash, 1, ZSTD_dictMatchState);
 }
=20
-/* Row-based matchfinder */
-size_t ZSTD_compressBlock_lazy2_row(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_lazy_dedicatedDictSearch_row(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
 {
-    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_rowHash, 2, ZSTD_noDict);
+    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_rowHash, 1, ZSTD_dedicatedDictSearch);
 }
+#endif
=20
-size_t ZSTD_compressBlock_lazy_row(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+#ifndef ZSTD_EXCLUDE_LAZY2_BLOCK_COMPRESSOR
+size_t ZSTD_compressBlock_lazy2(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
 {
-    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_rowHash, 1, ZSTD_noDict);
+    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_hashChain, 2, ZSTD_noDict);
 }
=20
-size_t ZSTD_compressBlock_greedy_row(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_lazy2_dictMatchState(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
 {
-    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_rowHash, 0, ZSTD_noDict);
+    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_hashChain, 2, ZSTD_dictMatchState);
 }
=20
-size_t ZSTD_compressBlock_lazy2_dictMatchState_row(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_lazy2_dedicatedDictSearch(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
 {
-    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_rowHash, 2, ZSTD_dictMatchState);
+    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_hashChain, 2, ZSTD_dedicatedDictSearch);
 }
=20
-size_t ZSTD_compressBlock_lazy_dictMatchState_row(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_lazy2_row(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
 {
-    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_rowHash, 1, ZSTD_dictMatchState);
+    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_rowHash, 2, ZSTD_noDict);
 }
=20
-size_t ZSTD_compressBlock_greedy_dictMatchState_row(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_lazy2_dictMatchState_row(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
 {
-    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_rowHash, 0, ZSTD_dictMatchState);
+    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_rowHash, 2, ZSTD_dictMatchState);
 }
=20
-
 size_t ZSTD_compressBlock_lazy2_dedicatedDictSearch_row(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
 {
     return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_rowHash, 2, ZSTD_dedicatedDictSearch);
 }
+#endif
=20
-size_t ZSTD_compressBlock_lazy_dedicatedDictSearch_row(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+#ifndef ZSTD_EXCLUDE_BTLAZY2_BLOCK_COMPRESSOR
+size_t ZSTD_compressBlock_btlazy2(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
 {
-    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_rowHash, 1, ZSTD_dedicatedDictSearch);
+    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_binaryTree, 2, ZSTD_noDict);
 }
=20
-size_t ZSTD_compressBlock_greedy_dedicatedDictSearch_row(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_btlazy2_dictMatchState(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
 {
-    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_rowHash, 0, ZSTD_dedicatedDictSearch);
+    return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize=
, search_binaryTree, 2, ZSTD_dictMatchState);
 }
+#endif
=20
+#if !defined(ZSTD_EXCLUDE_GREEDY_BLOCK_COMPRESSOR) \
+ || !defined(ZSTD_EXCLUDE_LAZY_BLOCK_COMPRESSOR) \
+ || !defined(ZSTD_EXCLUDE_LAZY2_BLOCK_COMPRESSOR) \
+ || !defined(ZSTD_EXCLUDE_BTLAZY2_BLOCK_COMPRESSOR)
 FORCE_INLINE_TEMPLATE
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
 size_t ZSTD_compressBlock_lazy_extDict_generic(
-                        ZSTD_matchState_t* ms, seqStore_t* seqStore,
+                        ZSTD_MatchState_t* ms, SeqStore_t* seqStore,
                         U32 rep[ZSTD_REP_NUM],
                         const void* src, size_t srcSize,
                         const searchMethod_e searchMethod, const U32 depth)
@@ -1886,12 +1960,13 @@ size_t ZSTD_compressBlock_lazy_extDict_generic(
=20
     DEBUGLOG(5, "ZSTD_compressBlock_lazy_extDict_generic (searchFunc=3D%u)=
", (U32)searchMethod);
=20
+    /* Reset the lazy skipping state */
+    ms->lazySkipping =3D 0;
+
     /* init */
     ip +=3D (ip =3D=3D prefixStart);
     if (searchMethod =3D=3D search_rowHash) {
-        ZSTD_row_fillHashCache(ms, base, rowLog,
-                               MIN(ms->cParams.minMatch, 6 /* mls caps out=
 at 6 */),
-                               ms->nextToUpdate, ilimit);
+        ZSTD_row_fillHashCache(ms, base, rowLog, mls, ms->nextToUpdate, il=
imit);
     }
=20
     /* Match Loop */
@@ -1903,7 +1978,7 @@ size_t ZSTD_compressBlock_lazy_extDict_generic(
 #endif
     while (ip < ilimit) {
         size_t matchLength=3D0;
-        size_t offcode=3DSTORE_REPCODE_1;
+        size_t offBase =3D REPCODE1_TO_OFFBASE;
         const BYTE* start=3Dip+1;
         U32 curr =3D (U32)(ip-base);
=20
@@ -1912,7 +1987,7 @@ size_t ZSTD_compressBlock_lazy_extDict_generic(
             const U32 repIndex =3D (U32)(curr+1 - offset_1);
             const BYTE* const repBase =3D repIndex < dictLimit ? dictBase =
: base;
             const BYTE* const repMatch =3D repBase + repIndex;
-            if ( ((U32)((dictLimit-1) - repIndex) >=3D 3) /* intentional o=
verflow */
+            if ( (ZSTD_index_overlap_check(dictLimit, repIndex))
                & (offset_1 <=3D curr+1 - windowLow) ) /* note: we are sear=
ching at curr+1 */
             if (MEM_read32(ip+1) =3D=3D MEM_read32(repMatch)) {
                 /* repcode detected we should take it */
@@ -1922,14 +1997,23 @@ size_t ZSTD_compressBlock_lazy_extDict_generic(
         }   }
=20
         /* first search (depth 0) */
-        {   size_t offsetFound =3D 999999999;
-            size_t const ml2 =3D ZSTD_searchMax(ms, ip, iend, &offsetFound=
, mls, rowLog, searchMethod, ZSTD_extDict);
+        {   size_t ofbCandidate =3D 999999999;
+            size_t const ml2 =3D ZSTD_searchMax(ms, ip, iend, &ofbCandidat=
e, mls, rowLog, searchMethod, ZSTD_extDict);
             if (ml2 > matchLength)
-                matchLength =3D ml2, start =3D ip, offcode=3DoffsetFound;
+                matchLength =3D ml2, start =3D ip, offBase =3D ofbCandidat=
e;
         }
=20
         if (matchLength < 4) {
-            ip +=3D ((ip-anchor) >> kSearchStrength) + 1;   /* jump faster=
 over incompressible sections */
+            size_t const step =3D ((size_t)(ip-anchor) >> kSearchStrength);
+            ip +=3D step + 1;   /* jump faster over incompressible section=
s */
+            /* Enter the lazy skipping mode once we are skipping more than=
 8 bytes at a time.
+             * In this mode we stop inserting every position into our tabl=
es, and only insert
+             * positions that we search, which is one in step positions.
+             * The exact cutoff is flexible, I've just chosen a number tha=
t is reasonably high,
+             * so we minimize the compression ratio loss in "normal" scena=
rios. This mode gets
+             * triggered once we've gone 2KB without finding any matches.
+             */
+            ms->lazySkipping =3D step > kLazySkippingStep;
             continue;
         }
=20
@@ -1939,30 +2023,30 @@ size_t ZSTD_compressBlock_lazy_extDict_generic(
             ip ++;
             curr++;
             /* check repCode */
-            if (offcode) {
+            if (offBase) {
                 const U32 windowLow =3D ZSTD_getLowestMatchIndex(ms, curr,=
 windowLog);
                 const U32 repIndex =3D (U32)(curr - offset_1);
                 const BYTE* const repBase =3D repIndex < dictLimit ? dictB=
ase : base;
                 const BYTE* const repMatch =3D repBase + repIndex;
-                if ( ((U32)((dictLimit-1) - repIndex) >=3D 3) /* intention=
al overflow : do not test positions overlapping 2 memory segments  */
+                if ( (ZSTD_index_overlap_check(dictLimit, repIndex))
                    & (offset_1 <=3D curr - windowLow) ) /* equivalent to `=
curr > repIndex >=3D windowLow` */
                 if (MEM_read32(ip) =3D=3D MEM_read32(repMatch)) {
                     /* repcode detected */
                     const BYTE* const repEnd =3D repIndex < dictLimit ? di=
ctEnd : iend;
                     size_t const repLength =3D ZSTD_count_2segments(ip+4, =
repMatch+4, iend, repEnd, prefixStart) + 4;
                     int const gain2 =3D (int)(repLength * 3);
-                    int const gain1 =3D (int)(matchLength*3 - ZSTD_highbit=
32((U32)STORED_TO_OFFBASE(offcode)) + 1);
+                    int const gain1 =3D (int)(matchLength*3 - ZSTD_highbit=
32((U32)offBase) + 1);
                     if ((repLength >=3D 4) && (gain2 > gain1))
-                        matchLength =3D repLength, offcode =3D STORE_REPCO=
DE_1, start =3D ip;
+                        matchLength =3D repLength, offBase =3D REPCODE1_TO=
_OFFBASE, start =3D ip;
             }   }
=20
             /* search match, depth 1 */
-            {   size_t offset2=3D999999999;
-                size_t const ml2 =3D ZSTD_searchMax(ms, ip, iend, &offset2=
, mls, rowLog, searchMethod, ZSTD_extDict);
-                int const gain2 =3D (int)(ml2*4 - ZSTD_highbit32((U32)STOR=
ED_TO_OFFBASE(offset2)));   /* raw approx */
-                int const gain1 =3D (int)(matchLength*4 - ZSTD_highbit32((=
U32)STORED_TO_OFFBASE(offcode)) + 4);
+            {   size_t ofbCandidate =3D 999999999;
+                size_t const ml2 =3D ZSTD_searchMax(ms, ip, iend, &ofbCand=
idate, mls, rowLog, searchMethod, ZSTD_extDict);
+                int const gain2 =3D (int)(ml2*4 - ZSTD_highbit32((U32)ofbC=
andidate));   /* raw approx */
+                int const gain1 =3D (int)(matchLength*4 - ZSTD_highbit32((=
U32)offBase) + 4);
                 if ((ml2 >=3D 4) && (gain2 > gain1)) {
-                    matchLength =3D ml2, offcode =3D offset2, start =3D ip;
+                    matchLength =3D ml2, offBase =3D ofbCandidate, start =
=3D ip;
                     continue;   /* search a better one */
             }   }
=20
@@ -1971,50 +2055,57 @@ size_t ZSTD_compressBlock_lazy_extDict_generic(
                 ip ++;
                 curr++;
                 /* check repCode */
-                if (offcode) {
+                if (offBase) {
                     const U32 windowLow =3D ZSTD_getLowestMatchIndex(ms, c=
urr, windowLog);
                     const U32 repIndex =3D (U32)(curr - offset_1);
                     const BYTE* const repBase =3D repIndex < dictLimit ? d=
ictBase : base;
                     const BYTE* const repMatch =3D repBase + repIndex;
-                    if ( ((U32)((dictLimit-1) - repIndex) >=3D 3) /* inten=
tional overflow : do not test positions overlapping 2 memory segments  */
+                    if ( (ZSTD_index_overlap_check(dictLimit, repIndex))
                        & (offset_1 <=3D curr - windowLow) ) /* equivalent =
to `curr > repIndex >=3D windowLow` */
                     if (MEM_read32(ip) =3D=3D MEM_read32(repMatch)) {
                         /* repcode detected */
                         const BYTE* const repEnd =3D repIndex < dictLimit =
? dictEnd : iend;
                         size_t const repLength =3D ZSTD_count_2segments(ip=
+4, repMatch+4, iend, repEnd, prefixStart) + 4;
                         int const gain2 =3D (int)(repLength * 4);
-                        int const gain1 =3D (int)(matchLength*4 - ZSTD_hig=
hbit32((U32)STORED_TO_OFFBASE(offcode)) + 1);
+                        int const gain1 =3D (int)(matchLength*4 - ZSTD_hig=
hbit32((U32)offBase) + 1);
                         if ((repLength >=3D 4) && (gain2 > gain1))
-                            matchLength =3D repLength, offcode =3D STORE_R=
EPCODE_1, start =3D ip;
+                            matchLength =3D repLength, offBase =3D REPCODE=
1_TO_OFFBASE, start =3D ip;
                 }   }
=20
                 /* search match, depth 2 */
-                {   size_t offset2=3D999999999;
-                    size_t const ml2 =3D ZSTD_searchMax(ms, ip, iend, &off=
set2, mls, rowLog, searchMethod, ZSTD_extDict);
-                    int const gain2 =3D (int)(ml2*4 - ZSTD_highbit32((U32)=
STORED_TO_OFFBASE(offset2)));   /* raw approx */
-                    int const gain1 =3D (int)(matchLength*4 - ZSTD_highbit=
32((U32)STORED_TO_OFFBASE(offcode)) + 7);
+                {   size_t ofbCandidate =3D 999999999;
+                    size_t const ml2 =3D ZSTD_searchMax(ms, ip, iend, &ofb=
Candidate, mls, rowLog, searchMethod, ZSTD_extDict);
+                    int const gain2 =3D (int)(ml2*4 - ZSTD_highbit32((U32)=
ofbCandidate));   /* raw approx */
+                    int const gain1 =3D (int)(matchLength*4 - ZSTD_highbit=
32((U32)offBase) + 7);
                     if ((ml2 >=3D 4) && (gain2 > gain1)) {
-                        matchLength =3D ml2, offcode =3D offset2, start =
=3D ip;
+                        matchLength =3D ml2, offBase =3D ofbCandidate, sta=
rt =3D ip;
                         continue;
             }   }   }
             break;  /* nothing found : store previous solution */
         }
=20
         /* catch up */
-        if (STORED_IS_OFFSET(offcode)) {
-            U32 const matchIndex =3D (U32)((size_t)(start-base) - STORED_O=
FFSET(offcode));
+        if (OFFBASE_IS_OFFSET(offBase)) {
+            U32 const matchIndex =3D (U32)((size_t)(start-base) - OFFBASE_=
TO_OFFSET(offBase));
             const BYTE* match =3D (matchIndex < dictLimit) ? dictBase + ma=
tchIndex : base + matchIndex;
             const BYTE* const mStart =3D (matchIndex < dictLimit) ? dictSt=
art : prefixStart;
             while ((start>anchor) && (match>mStart) && (start[-1] =3D=3D m=
atch[-1])) { start--; match--; matchLength++; }  /* catch up */
-            offset_2 =3D offset_1; offset_1 =3D (U32)STORED_OFFSET(offcode=
);
+            offset_2 =3D offset_1; offset_1 =3D (U32)OFFBASE_TO_OFFSET(off=
Base);
         }
=20
         /* store sequence */
 _storeSequence:
         {   size_t const litLength =3D (size_t)(start - anchor);
-            ZSTD_storeSeq(seqStore, litLength, anchor, iend, (U32)offcode,=
 matchLength);
+            ZSTD_storeSeq(seqStore, litLength, anchor, iend, (U32)offBase,=
 matchLength);
             anchor =3D ip =3D start + matchLength;
         }
+        if (ms->lazySkipping) {
+            /* We've found a match, disable lazy skipping mode, and refill=
 the hash cache. */
+            if (searchMethod =3D=3D search_rowHash) {
+                ZSTD_row_fillHashCache(ms, base, rowLog, mls, ms->nextToUp=
date, ilimit);
+            }
+            ms->lazySkipping =3D 0;
+        }
=20
         /* check immediate repcode */
         while (ip <=3D ilimit) {
@@ -2023,14 +2114,14 @@ size_t ZSTD_compressBlock_lazy_extDict_generic(
             const U32 repIndex =3D repCurrent - offset_2;
             const BYTE* const repBase =3D repIndex < dictLimit ? dictBase =
: base;
             const BYTE* const repMatch =3D repBase + repIndex;
-            if ( ((U32)((dictLimit-1) - repIndex) >=3D 3) /* intentional o=
verflow : do not test positions overlapping 2 memory segments  */
+            if ( (ZSTD_index_overlap_check(dictLimit, repIndex))
                & (offset_2 <=3D repCurrent - windowLow) ) /* equivalent to=
 `curr > repIndex >=3D windowLow` */
             if (MEM_read32(ip) =3D=3D MEM_read32(repMatch)) {
                 /* repcode detected we should take it */
                 const BYTE* const repEnd =3D repIndex < dictLimit ? dictEn=
d : iend;
                 matchLength =3D ZSTD_count_2segments(ip+4, repMatch+4, ien=
d, repEnd, prefixStart) + 4;
-                offcode =3D offset_2; offset_2 =3D offset_1; offset_1 =3D =
(U32)offcode;   /* swap offset history */
-                ZSTD_storeSeq(seqStore, 0, anchor, iend, STORE_REPCODE_1, =
matchLength);
+                offBase =3D offset_2; offset_2 =3D offset_1; offset_1 =3D =
(U32)offBase;   /* swap offset history */
+                ZSTD_storeSeq(seqStore, 0, anchor, iend, REPCODE1_TO_OFFBA=
SE, matchLength);
                 ip +=3D matchLength;
                 anchor =3D ip;
                 continue;   /* faster when present ... (?) */
@@ -2045,58 +2136,65 @@ size_t ZSTD_compressBlock_lazy_extDict_generic(
     /* Return the last literals size */
     return (size_t)(iend - anchor);
 }
+#endif /* build exclusions */
=20
-
+#ifndef ZSTD_EXCLUDE_GREEDY_BLOCK_COMPRESSOR
 size_t ZSTD_compressBlock_greedy_extDict(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
 {
     return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src,=
 srcSize, search_hashChain, 0);
 }
=20
-size_t ZSTD_compressBlock_lazy_extDict(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_greedy_extDict_row(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
-
 {
-    return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src,=
 srcSize, search_hashChain, 1);
+    return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src,=
 srcSize, search_rowHash, 0);
 }
+#endif
=20
-size_t ZSTD_compressBlock_lazy2_extDict(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+#ifndef ZSTD_EXCLUDE_LAZY_BLOCK_COMPRESSOR
+size_t ZSTD_compressBlock_lazy_extDict(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
=20
 {
-    return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src,=
 srcSize, search_hashChain, 2);
+    return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src,=
 srcSize, search_hashChain, 1);
 }
=20
-size_t ZSTD_compressBlock_btlazy2_extDict(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_lazy_extDict_row(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
=20
 {
-    return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src,=
 srcSize, search_binaryTree, 2);
+    return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src,=
 srcSize, search_rowHash, 1);
 }
+#endif
=20
-size_t ZSTD_compressBlock_greedy_extDict_row(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+#ifndef ZSTD_EXCLUDE_LAZY2_BLOCK_COMPRESSOR
+size_t ZSTD_compressBlock_lazy2_extDict(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
+
 {
-    return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src,=
 srcSize, search_rowHash, 0);
+    return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src,=
 srcSize, search_hashChain, 2);
 }
=20
-size_t ZSTD_compressBlock_lazy_extDict_row(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_lazy2_extDict_row(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
-
 {
-    return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src,=
 srcSize, search_rowHash, 1);
+    return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src,=
 srcSize, search_rowHash, 2);
 }
+#endif
=20
-size_t ZSTD_compressBlock_lazy2_extDict_row(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+#ifndef ZSTD_EXCLUDE_BTLAZY2_BLOCK_COMPRESSOR
+size_t ZSTD_compressBlock_btlazy2_extDict(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize)
=20
 {
-    return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src,=
 srcSize, search_rowHash, 2);
+    return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src,=
 srcSize, search_binaryTree, 2);
 }
+#endif
diff --git a/lib/zstd/compress/zstd_lazy.h b/lib/zstd/compress/zstd_lazy.h
index e5bdf4df8dde..987a036d8bde 100644
--- a/lib/zstd/compress/zstd_lazy.h
+++ b/lib/zstd/compress/zstd_lazy.h
@@ -1,5 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -11,7 +12,6 @@
 #ifndef ZSTD_LAZY_H
 #define ZSTD_LAZY_H
=20
-
 #include "zstd_compress_internal.h"
=20
 /*
@@ -22,98 +22,173 @@
  */
 #define ZSTD_LAZY_DDSS_BUCKET_LOG 2
=20
-U32 ZSTD_insertAndFindFirstIndex(ZSTD_matchState_t* ms, const BYTE* ip);
-void ZSTD_row_update(ZSTD_matchState_t* const ms, const BYTE* ip);
+#define ZSTD_ROW_HASH_TAG_BITS 8        /* nb bits to use for the tag */
+
+#if !defined(ZSTD_EXCLUDE_GREEDY_BLOCK_COMPRESSOR) \
+ || !defined(ZSTD_EXCLUDE_LAZY_BLOCK_COMPRESSOR) \
+ || !defined(ZSTD_EXCLUDE_LAZY2_BLOCK_COMPRESSOR) \
+ || !defined(ZSTD_EXCLUDE_BTLAZY2_BLOCK_COMPRESSOR)
+U32 ZSTD_insertAndFindFirstIndex(ZSTD_MatchState_t* ms, const BYTE* ip);
+void ZSTD_row_update(ZSTD_MatchState_t* const ms, const BYTE* ip);
=20
-void ZSTD_dedicatedDictSearch_lazy_loadDictionary(ZSTD_matchState_t* ms, c=
onst BYTE* const ip);
+void ZSTD_dedicatedDictSearch_lazy_loadDictionary(ZSTD_MatchState_t* ms, c=
onst BYTE* const ip);
=20
 void ZSTD_preserveUnsortedMark (U32* const table, U32 const size, U32 cons=
t reducerValue);  /*! used in ZSTD_reduceIndex(). preemptively increase val=
ue of ZSTD_DUBT_UNSORTED_MARK */
+#endif
=20
-size_t ZSTD_compressBlock_btlazy2(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+#ifndef ZSTD_EXCLUDE_GREEDY_BLOCK_COMPRESSOR
+size_t ZSTD_compressBlock_greedy(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
-size_t ZSTD_compressBlock_lazy2(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_greedy_row(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
-size_t ZSTD_compressBlock_lazy(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_greedy_dictMatchState(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
-size_t ZSTD_compressBlock_greedy(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_greedy_dictMatchState_row(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
-size_t ZSTD_compressBlock_lazy2_row(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_greedy_dedicatedDictSearch(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
-size_t ZSTD_compressBlock_lazy_row(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_greedy_dedicatedDictSearch_row(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
-size_t ZSTD_compressBlock_greedy_row(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_greedy_extDict(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        void const* src, size_t srcSize);
+size_t ZSTD_compressBlock_greedy_extDict_row(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
=20
-size_t ZSTD_compressBlock_btlazy2_dictMatchState(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+#define ZSTD_COMPRESSBLOCK_GREEDY ZSTD_compressBlock_greedy
+#define ZSTD_COMPRESSBLOCK_GREEDY_ROW ZSTD_compressBlock_greedy_row
+#define ZSTD_COMPRESSBLOCK_GREEDY_DICTMATCHSTATE ZSTD_compressBlock_greedy=
_dictMatchState
+#define ZSTD_COMPRESSBLOCK_GREEDY_DICTMATCHSTATE_ROW ZSTD_compressBlock_gr=
eedy_dictMatchState_row
+#define ZSTD_COMPRESSBLOCK_GREEDY_DEDICATEDDICTSEARCH ZSTD_compressBlock_g=
reedy_dedicatedDictSearch
+#define ZSTD_COMPRESSBLOCK_GREEDY_DEDICATEDDICTSEARCH_ROW ZSTD_compressBlo=
ck_greedy_dedicatedDictSearch_row
+#define ZSTD_COMPRESSBLOCK_GREEDY_EXTDICT ZSTD_compressBlock_greedy_extDict
+#define ZSTD_COMPRESSBLOCK_GREEDY_EXTDICT_ROW ZSTD_compressBlock_greedy_ex=
tDict_row
+#else
+#define ZSTD_COMPRESSBLOCK_GREEDY NULL
+#define ZSTD_COMPRESSBLOCK_GREEDY_ROW NULL
+#define ZSTD_COMPRESSBLOCK_GREEDY_DICTMATCHSTATE NULL
+#define ZSTD_COMPRESSBLOCK_GREEDY_DICTMATCHSTATE_ROW NULL
+#define ZSTD_COMPRESSBLOCK_GREEDY_DEDICATEDDICTSEARCH NULL
+#define ZSTD_COMPRESSBLOCK_GREEDY_DEDICATEDDICTSEARCH_ROW NULL
+#define ZSTD_COMPRESSBLOCK_GREEDY_EXTDICT NULL
+#define ZSTD_COMPRESSBLOCK_GREEDY_EXTDICT_ROW NULL
+#endif
+
+#ifndef ZSTD_EXCLUDE_LAZY_BLOCK_COMPRESSOR
+size_t ZSTD_compressBlock_lazy(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
-size_t ZSTD_compressBlock_lazy2_dictMatchState(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_lazy_row(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
 size_t ZSTD_compressBlock_lazy_dictMatchState(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
-        void const* src, size_t srcSize);
-size_t ZSTD_compressBlock_greedy_dictMatchState(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
-        void const* src, size_t srcSize);
-size_t ZSTD_compressBlock_lazy2_dictMatchState_row(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
 size_t ZSTD_compressBlock_lazy_dictMatchState_row(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
-size_t ZSTD_compressBlock_greedy_dictMatchState_row(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_lazy_dedicatedDictSearch(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
-
-size_t ZSTD_compressBlock_lazy2_dedicatedDictSearch(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_lazy_dedicatedDictSearch_row(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
-size_t ZSTD_compressBlock_lazy_dedicatedDictSearch(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_lazy_extDict(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
-size_t ZSTD_compressBlock_greedy_dedicatedDictSearch(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_lazy_extDict_row(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
-size_t ZSTD_compressBlock_lazy2_dedicatedDictSearch_row(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+
+#define ZSTD_COMPRESSBLOCK_LAZY ZSTD_compressBlock_lazy
+#define ZSTD_COMPRESSBLOCK_LAZY_ROW ZSTD_compressBlock_lazy_row
+#define ZSTD_COMPRESSBLOCK_LAZY_DICTMATCHSTATE ZSTD_compressBlock_lazy_dic=
tMatchState
+#define ZSTD_COMPRESSBLOCK_LAZY_DICTMATCHSTATE_ROW ZSTD_compressBlock_lazy=
_dictMatchState_row
+#define ZSTD_COMPRESSBLOCK_LAZY_DEDICATEDDICTSEARCH ZSTD_compressBlock_laz=
y_dedicatedDictSearch
+#define ZSTD_COMPRESSBLOCK_LAZY_DEDICATEDDICTSEARCH_ROW ZSTD_compressBlock=
_lazy_dedicatedDictSearch_row
+#define ZSTD_COMPRESSBLOCK_LAZY_EXTDICT ZSTD_compressBlock_lazy_extDict
+#define ZSTD_COMPRESSBLOCK_LAZY_EXTDICT_ROW ZSTD_compressBlock_lazy_extDic=
t_row
+#else
+#define ZSTD_COMPRESSBLOCK_LAZY NULL
+#define ZSTD_COMPRESSBLOCK_LAZY_ROW NULL
+#define ZSTD_COMPRESSBLOCK_LAZY_DICTMATCHSTATE NULL
+#define ZSTD_COMPRESSBLOCK_LAZY_DICTMATCHSTATE_ROW NULL
+#define ZSTD_COMPRESSBLOCK_LAZY_DEDICATEDDICTSEARCH NULL
+#define ZSTD_COMPRESSBLOCK_LAZY_DEDICATEDDICTSEARCH_ROW NULL
+#define ZSTD_COMPRESSBLOCK_LAZY_EXTDICT NULL
+#define ZSTD_COMPRESSBLOCK_LAZY_EXTDICT_ROW NULL
+#endif
+
+#ifndef ZSTD_EXCLUDE_LAZY2_BLOCK_COMPRESSOR
+size_t ZSTD_compressBlock_lazy2(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
-size_t ZSTD_compressBlock_lazy_dedicatedDictSearch_row(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_lazy2_row(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
-size_t ZSTD_compressBlock_greedy_dedicatedDictSearch_row(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_lazy2_dictMatchState(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
-
-size_t ZSTD_compressBlock_greedy_extDict(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_lazy2_dictMatchState_row(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
-size_t ZSTD_compressBlock_lazy_extDict(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_lazy2_dedicatedDictSearch(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        void const* src, size_t srcSize);
+size_t ZSTD_compressBlock_lazy2_dedicatedDictSearch_row(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
 size_t ZSTD_compressBlock_lazy2_extDict(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
-size_t ZSTD_compressBlock_greedy_extDict_row(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_lazy2_extDict_row(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
-size_t ZSTD_compressBlock_lazy_extDict_row(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+
+#define ZSTD_COMPRESSBLOCK_LAZY2 ZSTD_compressBlock_lazy2
+#define ZSTD_COMPRESSBLOCK_LAZY2_ROW ZSTD_compressBlock_lazy2_row
+#define ZSTD_COMPRESSBLOCK_LAZY2_DICTMATCHSTATE ZSTD_compressBlock_lazy2_d=
ictMatchState
+#define ZSTD_COMPRESSBLOCK_LAZY2_DICTMATCHSTATE_ROW ZSTD_compressBlock_laz=
y2_dictMatchState_row
+#define ZSTD_COMPRESSBLOCK_LAZY2_DEDICATEDDICTSEARCH ZSTD_compressBlock_la=
zy2_dedicatedDictSearch
+#define ZSTD_COMPRESSBLOCK_LAZY2_DEDICATEDDICTSEARCH_ROW ZSTD_compressBloc=
k_lazy2_dedicatedDictSearch_row
+#define ZSTD_COMPRESSBLOCK_LAZY2_EXTDICT ZSTD_compressBlock_lazy2_extDict
+#define ZSTD_COMPRESSBLOCK_LAZY2_EXTDICT_ROW ZSTD_compressBlock_lazy2_extD=
ict_row
+#else
+#define ZSTD_COMPRESSBLOCK_LAZY2 NULL
+#define ZSTD_COMPRESSBLOCK_LAZY2_ROW NULL
+#define ZSTD_COMPRESSBLOCK_LAZY2_DICTMATCHSTATE NULL
+#define ZSTD_COMPRESSBLOCK_LAZY2_DICTMATCHSTATE_ROW NULL
+#define ZSTD_COMPRESSBLOCK_LAZY2_DEDICATEDDICTSEARCH NULL
+#define ZSTD_COMPRESSBLOCK_LAZY2_DEDICATEDDICTSEARCH_ROW NULL
+#define ZSTD_COMPRESSBLOCK_LAZY2_EXTDICT NULL
+#define ZSTD_COMPRESSBLOCK_LAZY2_EXTDICT_ROW NULL
+#endif
+
+#ifndef ZSTD_EXCLUDE_BTLAZY2_BLOCK_COMPRESSOR
+size_t ZSTD_compressBlock_btlazy2(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
-size_t ZSTD_compressBlock_lazy2_extDict_row(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_btlazy2_dictMatchState(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
 size_t ZSTD_compressBlock_btlazy2_extDict(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
-       =20
=20
+#define ZSTD_COMPRESSBLOCK_BTLAZY2 ZSTD_compressBlock_btlazy2
+#define ZSTD_COMPRESSBLOCK_BTLAZY2_DICTMATCHSTATE ZSTD_compressBlock_btlaz=
y2_dictMatchState
+#define ZSTD_COMPRESSBLOCK_BTLAZY2_EXTDICT ZSTD_compressBlock_btlazy2_extD=
ict
+#else
+#define ZSTD_COMPRESSBLOCK_BTLAZY2 NULL
+#define ZSTD_COMPRESSBLOCK_BTLAZY2_DICTMATCHSTATE NULL
+#define ZSTD_COMPRESSBLOCK_BTLAZY2_EXTDICT NULL
+#endif
=20
 #endif /* ZSTD_LAZY_H */
diff --git a/lib/zstd/compress/zstd_ldm.c b/lib/zstd/compress/zstd_ldm.c
index dd86fc83e7dd..54eefad9cae6 100644
--- a/lib/zstd/compress/zstd_ldm.c
+++ b/lib/zstd/compress/zstd_ldm.c
@@ -1,5 +1,6 @@
+// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -16,7 +17,7 @@
 #include "zstd_double_fast.h"   /* ZSTD_fillDoubleHashTable() */
 #include "zstd_ldm_geartab.h"
=20
-#define LDM_BUCKET_SIZE_LOG 3
+#define LDM_BUCKET_SIZE_LOG 4
 #define LDM_MIN_MATCH_LENGTH 64
 #define LDM_HASH_RLOG 7
=20
@@ -133,21 +134,35 @@ static size_t ZSTD_ldm_gear_feed(ldmRollingHashState_=
t* state,
 }
=20
 void ZSTD_ldm_adjustParameters(ldmParams_t* params,
-                               ZSTD_compressionParameters const* cParams)
+                        const ZSTD_compressionParameters* cParams)
 {
     params->windowLog =3D cParams->windowLog;
     ZSTD_STATIC_ASSERT(LDM_BUCKET_SIZE_LOG <=3D ZSTD_LDM_BUCKETSIZELOG_MAX=
);
     DEBUGLOG(4, "ZSTD_ldm_adjustParameters");
-    if (!params->bucketSizeLog) params->bucketSizeLog =3D LDM_BUCKET_SIZE_=
LOG;
-    if (!params->minMatchLength) params->minMatchLength =3D LDM_MIN_MATCH_=
LENGTH;
+    if (params->hashRateLog =3D=3D 0) {
+        if (params->hashLog > 0) {
+            /* if params->hashLog is set, derive hashRateLog from it */
+            assert(params->hashLog <=3D ZSTD_HASHLOG_MAX);
+            if (params->windowLog > params->hashLog) {
+                params->hashRateLog =3D params->windowLog - params->hashLo=
g;
+            }
+        } else {
+            assert(1 <=3D (int)cParams->strategy && (int)cParams->strategy=
 <=3D 9);
+            /* mapping from [fast, rate7] to [btultra2, rate4] */
+            params->hashRateLog =3D 7 - (cParams->strategy/3);
+        }
+    }
     if (params->hashLog =3D=3D 0) {
-        params->hashLog =3D MAX(ZSTD_HASHLOG_MIN, params->windowLog - LDM_=
HASH_RLOG);
-        assert(params->hashLog <=3D ZSTD_HASHLOG_MAX);
+        params->hashLog =3D BOUNDED(ZSTD_HASHLOG_MIN, params->windowLog - =
params->hashRateLog, ZSTD_HASHLOG_MAX);
     }
-    if (params->hashRateLog =3D=3D 0) {
-        params->hashRateLog =3D params->windowLog < params->hashLog
-                                   ? 0
-                                   : params->windowLog - params->hashLog;
+    if (params->minMatchLength =3D=3D 0) {
+        params->minMatchLength =3D LDM_MIN_MATCH_LENGTH;
+        if (cParams->strategy >=3D ZSTD_btultra)
+            params->minMatchLength /=3D 2;
+    }
+    if (params->bucketSizeLog=3D=3D0) {
+        assert(1 <=3D (int)cParams->strategy && (int)cParams->strategy <=
=3D 9);
+        params->bucketSizeLog =3D BOUNDED(LDM_BUCKET_SIZE_LOG, (U32)cParam=
s->strategy, ZSTD_LDM_BUCKETSIZELOG_MAX);
     }
     params->bucketSizeLog =3D MIN(params->bucketSizeLog, params->hashLog);
 }
@@ -170,22 +185,22 @@ size_t ZSTD_ldm_getMaxNbSeq(ldmParams_t params, size_=
t maxChunkSize)
 /* ZSTD_ldm_getBucket() :
  *  Returns a pointer to the start of the bucket associated with hash. */
 static ldmEntry_t* ZSTD_ldm_getBucket(
-        ldmState_t* ldmState, size_t hash, ldmParams_t const ldmParams)
+        const ldmState_t* ldmState, size_t hash, U32 const bucketSizeLog)
 {
-    return ldmState->hashTable + (hash << ldmParams.bucketSizeLog);
+    return ldmState->hashTable + (hash << bucketSizeLog);
 }
=20
 /* ZSTD_ldm_insertEntry() :
  *  Insert the entry with corresponding hash into the hash table */
 static void ZSTD_ldm_insertEntry(ldmState_t* ldmState,
                                  size_t const hash, const ldmEntry_t entry,
-                                 ldmParams_t const ldmParams)
+                                 U32 const bucketSizeLog)
 {
     BYTE* const pOffset =3D ldmState->bucketOffsets + hash;
     unsigned const offset =3D *pOffset;
=20
-    *(ZSTD_ldm_getBucket(ldmState, hash, ldmParams) + offset) =3D entry;
-    *pOffset =3D (BYTE)((offset + 1) & ((1u << ldmParams.bucketSizeLog) - =
1));
+    *(ZSTD_ldm_getBucket(ldmState, hash, bucketSizeLog) + offset) =3D entr=
y;
+    *pOffset =3D (BYTE)((offset + 1) & ((1u << bucketSizeLog) - 1));
=20
 }
=20
@@ -234,7 +249,7 @@ static size_t ZSTD_ldm_countBackwardsMatch_2segments(
  *
  *  The tables for the other strategies are filled within their
  *  block compressors. */
-static size_t ZSTD_ldm_fillFastTables(ZSTD_matchState_t* ms,
+static size_t ZSTD_ldm_fillFastTables(ZSTD_MatchState_t* ms,
                                       void const* end)
 {
     const BYTE* const iend =3D (const BYTE*)end;
@@ -242,11 +257,15 @@ static size_t ZSTD_ldm_fillFastTables(ZSTD_matchState=
_t* ms,
     switch(ms->cParams.strategy)
     {
     case ZSTD_fast:
-        ZSTD_fillHashTable(ms, iend, ZSTD_dtlm_fast);
+        ZSTD_fillHashTable(ms, iend, ZSTD_dtlm_fast, ZSTD_tfp_forCCtx);
         break;
=20
     case ZSTD_dfast:
-        ZSTD_fillDoubleHashTable(ms, iend, ZSTD_dtlm_fast);
+#ifndef ZSTD_EXCLUDE_DFAST_BLOCK_COMPRESSOR
+        ZSTD_fillDoubleHashTable(ms, iend, ZSTD_dtlm_fast, ZSTD_tfp_forCCt=
x);
+#else
+        assert(0); /* shouldn't be called: cparams should've been adjusted=
. */
+#endif
         break;
=20
     case ZSTD_greedy:
@@ -269,7 +288,8 @@ void ZSTD_ldm_fillHashTable(
             const BYTE* iend, ldmParams_t const* params)
 {
     U32 const minMatchLength =3D params->minMatchLength;
-    U32 const hBits =3D params->hashLog - params->bucketSizeLog;
+    U32 const bucketSizeLog =3D params->bucketSizeLog;
+    U32 const hBits =3D params->hashLog - bucketSizeLog;
     BYTE const* const base =3D ldmState->window.base;
     BYTE const* const istart =3D ip;
     ldmRollingHashState_t hashState;
@@ -284,7 +304,7 @@ void ZSTD_ldm_fillHashTable(
         unsigned n;
=20
         numSplits =3D 0;
-        hashed =3D ZSTD_ldm_gear_feed(&hashState, ip, iend - ip, splits, &=
numSplits);
+        hashed =3D ZSTD_ldm_gear_feed(&hashState, ip, (size_t)(iend - ip),=
 splits, &numSplits);
=20
         for (n =3D 0; n < numSplits; n++) {
             if (ip + splits[n] >=3D istart + minMatchLength) {
@@ -295,7 +315,7 @@ void ZSTD_ldm_fillHashTable(
=20
                 entry.offset =3D (U32)(split - base);
                 entry.checksum =3D (U32)(xxhash >> 32);
-                ZSTD_ldm_insertEntry(ldmState, hash, entry, *params);
+                ZSTD_ldm_insertEntry(ldmState, hash, entry, params->bucket=
SizeLog);
             }
         }
=20
@@ -309,7 +329,7 @@ void ZSTD_ldm_fillHashTable(
  *  Sets cctx->nextToUpdate to a position corresponding closer to anchor
  *  if it is far way
  *  (after a long match, only update tables a limited amount). */
-static void ZSTD_ldm_limitTableUpdate(ZSTD_matchState_t* ms, const BYTE* a=
nchor)
+static void ZSTD_ldm_limitTableUpdate(ZSTD_MatchState_t* ms, const BYTE* a=
nchor)
 {
     U32 const curr =3D (U32)(anchor - ms->window.base);
     if (curr > ms->nextToUpdate + 1024) {
@@ -318,8 +338,10 @@ static void ZSTD_ldm_limitTableUpdate(ZSTD_matchState_=
t* ms, const BYTE* anchor)
     }
 }
=20
-static size_t ZSTD_ldm_generateSequences_internal(
-        ldmState_t* ldmState, rawSeqStore_t* rawSeqStore,
+static
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
+size_t ZSTD_ldm_generateSequences_internal(
+        ldmState_t* ldmState, RawSeqStore_t* rawSeqStore,
         ldmParams_t const* params, void const* src, size_t srcSize)
 {
     /* LDM parameters */
@@ -373,7 +395,7 @@ static size_t ZSTD_ldm_generateSequences_internal(
             candidates[n].split =3D split;
             candidates[n].hash =3D hash;
             candidates[n].checksum =3D (U32)(xxhash >> 32);
-            candidates[n].bucket =3D ZSTD_ldm_getBucket(ldmState, hash, *p=
arams);
+            candidates[n].bucket =3D ZSTD_ldm_getBucket(ldmState, hash, pa=
rams->bucketSizeLog);
             PREFETCH_L1(candidates[n].bucket);
         }
=20
@@ -396,7 +418,7 @@ static size_t ZSTD_ldm_generateSequences_internal(
              * the previous one, we merely register it in the hash table a=
nd
              * move on */
             if (split < anchor) {
-                ZSTD_ldm_insertEntry(ldmState, hash, newEntry, *params);
+                ZSTD_ldm_insertEntry(ldmState, hash, newEntry, params->buc=
ketSizeLog);
                 continue;
             }
=20
@@ -443,7 +465,7 @@ static size_t ZSTD_ldm_generateSequences_internal(
             /* No match found -- insert an entry into the hash table
              * and process the next candidate match */
             if (bestEntry =3D=3D NULL) {
-                ZSTD_ldm_insertEntry(ldmState, hash, newEntry, *params);
+                ZSTD_ldm_insertEntry(ldmState, hash, newEntry, params->buc=
ketSizeLog);
                 continue;
             }
=20
@@ -464,7 +486,7 @@ static size_t ZSTD_ldm_generateSequences_internal(
=20
             /* Insert the current entry into the hash table --- it must be
              * done after the previous block to avoid clobbering bestEntry=
 */
-            ZSTD_ldm_insertEntry(ldmState, hash, newEntry, *params);
+            ZSTD_ldm_insertEntry(ldmState, hash, newEntry, params->bucketS=
izeLog);
=20
             anchor =3D split + forwardMatchLength;
=20
@@ -503,7 +525,7 @@ static void ZSTD_ldm_reduceTable(ldmEntry_t* const tabl=
e, U32 const size,
 }
=20
 size_t ZSTD_ldm_generateSequences(
-        ldmState_t* ldmState, rawSeqStore_t* sequences,
+        ldmState_t* ldmState, RawSeqStore_t* sequences,
         ldmParams_t const* params, void const* src, size_t srcSize)
 {
     U32 const maxDist =3D 1U << params->windowLog;
@@ -549,7 +571,7 @@ size_t ZSTD_ldm_generateSequences(
          * the window through early invalidation.
          * TODO: * Test the chunk size.
          *       * Try invalidation after the sequence generation and test=
 the
-         *         the offset against maxDist directly.
+         *         offset against maxDist directly.
          *
          * NOTE: Because of dictionaries + sequence splitting we MUST make=
 sure
          * that any offset used is valid at the END of the sequence, since=
 it may
@@ -580,7 +602,7 @@ size_t ZSTD_ldm_generateSequences(
 }
=20
 void
-ZSTD_ldm_skipSequences(rawSeqStore_t* rawSeqStore, size_t srcSize, U32 con=
st minMatch)
+ZSTD_ldm_skipSequences(RawSeqStore_t* rawSeqStore, size_t srcSize, U32 con=
st minMatch)
 {
     while (srcSize > 0 && rawSeqStore->pos < rawSeqStore->size) {
         rawSeq* seq =3D rawSeqStore->seq + rawSeqStore->pos;
@@ -616,7 +638,7 @@ ZSTD_ldm_skipSequences(rawSeqStore_t* rawSeqStore, size=
_t srcSize, U32 const min
  * Returns the current sequence to handle, or if the rest of the block sho=
uld
  * be literals, it returns a sequence with offset =3D=3D 0.
  */
-static rawSeq maybeSplitSequence(rawSeqStore_t* rawSeqStore,
+static rawSeq maybeSplitSequence(RawSeqStore_t* rawSeqStore,
                                  U32 const remaining, U32 const minMatch)
 {
     rawSeq sequence =3D rawSeqStore->seq[rawSeqStore->pos];
@@ -640,7 +662,7 @@ static rawSeq maybeSplitSequence(rawSeqStore_t* rawSeqS=
tore,
     return sequence;
 }
=20
-void ZSTD_ldm_skipRawSeqStoreBytes(rawSeqStore_t* rawSeqStore, size_t nbBy=
tes) {
+void ZSTD_ldm_skipRawSeqStoreBytes(RawSeqStore_t* rawSeqStore, size_t nbBy=
tes) {
     U32 currPos =3D (U32)(rawSeqStore->posInSequence + nbBytes);
     while (currPos && rawSeqStore->pos < rawSeqStore->size) {
         rawSeq currSeq =3D rawSeqStore->seq[rawSeqStore->pos];
@@ -657,14 +679,14 @@ void ZSTD_ldm_skipRawSeqStoreBytes(rawSeqStore_t* raw=
SeqStore, size_t nbBytes) {
     }
 }
=20
-size_t ZSTD_ldm_blockCompress(rawSeqStore_t* rawSeqStore,
-    ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
-    ZSTD_paramSwitch_e useRowMatchFinder,
+size_t ZSTD_ldm_blockCompress(RawSeqStore_t* rawSeqStore,
+    ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+    ZSTD_ParamSwitch_e useRowMatchFinder,
     void const* src, size_t srcSize)
 {
     const ZSTD_compressionParameters* const cParams =3D &ms->cParams;
     unsigned const minMatch =3D cParams->minMatch;
-    ZSTD_blockCompressor const blockCompressor =3D
+    ZSTD_BlockCompressor_f const blockCompressor =3D
         ZSTD_selectBlockCompressor(cParams->strategy, useRowMatchFinder, Z=
STD_matchState_dictMode(ms));
     /* Input bounds */
     BYTE const* const istart =3D (BYTE const*)src;
@@ -689,7 +711,6 @@ size_t ZSTD_ldm_blockCompress(rawSeqStore_t* rawSeqStor=
e,
         /* maybeSplitSequence updates rawSeqStore->pos */
         rawSeq const sequence =3D maybeSplitSequence(rawSeqStore,
                                                    (U32)(iend - ip), minMa=
tch);
-        int i;
         /* End signal */
         if (sequence.offset =3D=3D 0)
             break;
@@ -702,6 +723,7 @@ size_t ZSTD_ldm_blockCompress(rawSeqStore_t* rawSeqStor=
e,
         /* Run the block compressor */
         DEBUGLOG(5, "pos %u : calling block compressor on segment of size =
%u", (unsigned)(ip-istart), sequence.litLength);
         {
+            int i;
             size_t const newLitLength =3D
                 blockCompressor(ms, seqStore, rep, ip, sequence.litLength);
             ip +=3D sequence.litLength;
@@ -711,7 +733,7 @@ size_t ZSTD_ldm_blockCompress(rawSeqStore_t* rawSeqStor=
e,
             rep[0] =3D sequence.offset;
             /* Store the sequence */
             ZSTD_storeSeq(seqStore, newLitLength, ip - newLitLength, iend,
-                          STORE_OFFSET(sequence.offset),
+                          OFFSET_TO_OFFBASE(sequence.offset),
                           sequence.matchLength);
             ip +=3D sequence.matchLength;
         }
diff --git a/lib/zstd/compress/zstd_ldm.h b/lib/zstd/compress/zstd_ldm.h
index fbc6a5e88fd7..41400a7191b2 100644
--- a/lib/zstd/compress/zstd_ldm.h
+++ b/lib/zstd/compress/zstd_ldm.h
@@ -1,5 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -11,7 +12,6 @@
 #ifndef ZSTD_LDM_H
 #define ZSTD_LDM_H
=20
-
 #include "zstd_compress_internal.h"   /* ldmParams_t, U32 */
 #include <linux/zstd.h>   /* ZSTD_CCtx, size_t */
=20
@@ -40,7 +40,7 @@ void ZSTD_ldm_fillHashTable(
  *       sequences.
  */
 size_t ZSTD_ldm_generateSequences(
-            ldmState_t* ldms, rawSeqStore_t* sequences,
+            ldmState_t* ldms, RawSeqStore_t* sequences,
             ldmParams_t const* params, void const* src, size_t srcSize);
=20
 /*
@@ -61,9 +61,9 @@ size_t ZSTD_ldm_generateSequences(
  * two. We handle that case correctly, and update `rawSeqStore` appropriat=
ely.
  * NOTE: This function does not return any errors.
  */
-size_t ZSTD_ldm_blockCompress(rawSeqStore_t* rawSeqStore,
-            ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_=
NUM],
-            ZSTD_paramSwitch_e useRowMatchFinder,
+size_t ZSTD_ldm_blockCompress(RawSeqStore_t* rawSeqStore,
+            ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_=
NUM],
+            ZSTD_ParamSwitch_e useRowMatchFinder,
             void const* src, size_t srcSize);
=20
 /*
@@ -73,7 +73,7 @@ size_t ZSTD_ldm_blockCompress(rawSeqStore_t* rawSeqStore,
  * Avoids emitting matches less than `minMatch` bytes.
  * Must be called for data that is not passed to ZSTD_ldm_blockCompress().
  */
-void ZSTD_ldm_skipSequences(rawSeqStore_t* rawSeqStore, size_t srcSize,
+void ZSTD_ldm_skipSequences(RawSeqStore_t* rawSeqStore, size_t srcSize,
     U32 const minMatch);
=20
 /* ZSTD_ldm_skipRawSeqStoreBytes():
@@ -81,7 +81,7 @@ void ZSTD_ldm_skipSequences(rawSeqStore_t* rawSeqStore, s=
ize_t srcSize,
  * Not to be used in conjunction with ZSTD_ldm_skipSequences().
  * Must be called for data with is not passed to ZSTD_ldm_blockCompress().
  */
-void ZSTD_ldm_skipRawSeqStoreBytes(rawSeqStore_t* rawSeqStore, size_t nbBy=
tes);
+void ZSTD_ldm_skipRawSeqStoreBytes(RawSeqStore_t* rawSeqStore, size_t nbBy=
tes);
=20
 /* ZSTD_ldm_getTableSize() :
  *  Estimate the space needed for long distance matching tables or 0 if LD=
M is
@@ -107,5 +107,4 @@ size_t ZSTD_ldm_getMaxNbSeq(ldmParams_t params, size_t =
maxChunkSize);
 void ZSTD_ldm_adjustParameters(ldmParams_t* params,
                                ZSTD_compressionParameters const* cParams);
=20
-
 #endif /* ZSTD_FAST_H */
diff --git a/lib/zstd/compress/zstd_ldm_geartab.h b/lib/zstd/compress/zstd_=
ldm_geartab.h
index 647f865be290..cfccfc46f6f7 100644
--- a/lib/zstd/compress/zstd_ldm_geartab.h
+++ b/lib/zstd/compress/zstd_ldm_geartab.h
@@ -1,5 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
diff --git a/lib/zstd/compress/zstd_opt.c b/lib/zstd/compress/zstd_opt.c
index fd82acfda62f..b62fd1b0d83e 100644
--- a/lib/zstd/compress/zstd_opt.c
+++ b/lib/zstd/compress/zstd_opt.c
@@ -1,5 +1,6 @@
+// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
 /*
- * Copyright (c) Przemyslaw Skibinski, Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -12,11 +13,14 @@
 #include "hist.h"
 #include "zstd_opt.h"
=20
+#if !defined(ZSTD_EXCLUDE_BTLAZY2_BLOCK_COMPRESSOR) \
+ || !defined(ZSTD_EXCLUDE_BTOPT_BLOCK_COMPRESSOR) \
+ || !defined(ZSTD_EXCLUDE_BTULTRA_BLOCK_COMPRESSOR)
=20
 #define ZSTD_LITFREQ_ADD    2   /* scaling factor for litFreq, so that fre=
quencies adapt faster to new stats */
 #define ZSTD_MAX_PRICE     (1<<30)
=20
-#define ZSTD_PREDEF_THRESHOLD 1024   /* if srcSize < ZSTD_PREDEF_THRESHOLD=
, symbols' cost is assumed static, directly determined by pre-defined distr=
ibutions */
+#define ZSTD_PREDEF_THRESHOLD 8   /* if srcSize < ZSTD_PREDEF_THRESHOLD, s=
ymbols' cost is assumed static, directly determined by pre-defined distribu=
tions */
=20
=20
 /*-*************************************
@@ -26,27 +30,35 @@
 #if 0    /* approximation at bit level (for tests) */
 #  define BITCOST_ACCURACY 0
 #  define BITCOST_MULTIPLIER (1 << BITCOST_ACCURACY)
-#  define WEIGHT(stat, opt) ((void)opt, ZSTD_bitWeight(stat))
+#  define WEIGHT(stat, opt) ((void)(opt), ZSTD_bitWeight(stat))
 #elif 0  /* fractional bit accuracy (for tests) */
 #  define BITCOST_ACCURACY 8
 #  define BITCOST_MULTIPLIER (1 << BITCOST_ACCURACY)
-#  define WEIGHT(stat,opt) ((void)opt, ZSTD_fracWeight(stat))
+#  define WEIGHT(stat,opt) ((void)(opt), ZSTD_fracWeight(stat))
 #else    /* opt=3D=3Dapprox, ultra=3D=3Daccurate */
 #  define BITCOST_ACCURACY 8
 #  define BITCOST_MULTIPLIER (1 << BITCOST_ACCURACY)
-#  define WEIGHT(stat,opt) (opt ? ZSTD_fracWeight(stat) : ZSTD_bitWeight(s=
tat))
+#  define WEIGHT(stat,opt) ((opt) ? ZSTD_fracWeight(stat) : ZSTD_bitWeight=
(stat))
 #endif
=20
+/* ZSTD_bitWeight() :
+ * provide estimated "cost" of a stat in full bits only */
 MEM_STATIC U32 ZSTD_bitWeight(U32 stat)
 {
     return (ZSTD_highbit32(stat+1) * BITCOST_MULTIPLIER);
 }
=20
+/* ZSTD_fracWeight() :
+ * provide fractional-bit "cost" of a stat,
+ * using linear interpolation approximation */
 MEM_STATIC U32 ZSTD_fracWeight(U32 rawStat)
 {
     U32 const stat =3D rawStat + 1;
     U32 const hb =3D ZSTD_highbit32(stat);
     U32 const BWeight =3D hb * BITCOST_MULTIPLIER;
+    /* Fweight was meant for "Fractional weight"
+     * but it's effectively a value between 1 and 2
+     * using fixed point arithmetic */
     U32 const FWeight =3D (stat << BITCOST_ACCURACY) >> hb;
     U32 const weight =3D BWeight + FWeight;
     assert(hb + BITCOST_ACCURACY < 31);
@@ -57,7 +69,7 @@ MEM_STATIC U32 ZSTD_fracWeight(U32 rawStat)
 /* debugging function,
  * @return price in bytes as fractional value
  * for debug messages only */
-MEM_STATIC double ZSTD_fCost(U32 price)
+MEM_STATIC double ZSTD_fCost(int price)
 {
     return (double)price / (BITCOST_MULTIPLIER*8);
 }
@@ -88,20 +100,26 @@ static U32 sum_u32(const unsigned table[], size_t nbEl=
ts)
     return total;
 }
=20
-static U32 ZSTD_downscaleStats(unsigned* table, U32 lastEltIndex, U32 shif=
t)
+typedef enum { base_0possible=3D0, base_1guaranteed=3D1 } base_directive_e;
+
+static U32
+ZSTD_downscaleStats(unsigned* table, U32 lastEltIndex, U32 shift, base_dir=
ective_e base1)
 {
     U32 s, sum=3D0;
-    DEBUGLOG(5, "ZSTD_downscaleStats (nbElts=3D%u, shift=3D%u)", (unsigned=
)lastEltIndex+1, (unsigned)shift);
+    DEBUGLOG(5, "ZSTD_downscaleStats (nbElts=3D%u, shift=3D%u)",
+            (unsigned)lastEltIndex+1, (unsigned)shift );
     assert(shift < 30);
     for (s=3D0; s<lastEltIndex+1; s++) {
-        table[s] =3D 1 + (table[s] >> shift);
-        sum +=3D table[s];
+        unsigned const base =3D base1 ? 1 : (table[s]>0);
+        unsigned const newStat =3D base + (table[s] >> shift);
+        sum +=3D newStat;
+        table[s] =3D newStat;
     }
     return sum;
 }
=20
 /* ZSTD_scaleStats() :
- * reduce all elements in table is sum too large
+ * reduce all elt frequencies in table if sum too large
  * return the resulting sum of elements */
 static U32 ZSTD_scaleStats(unsigned* table, U32 lastEltIndex, U32 logTarge=
t)
 {
@@ -110,7 +128,7 @@ static U32 ZSTD_scaleStats(unsigned* table, U32 lastElt=
Index, U32 logTarget)
     DEBUGLOG(5, "ZSTD_scaleStats (nbElts=3D%u, target=3D%u)", (unsigned)la=
stEltIndex+1, (unsigned)logTarget);
     assert(logTarget < 30);
     if (factor <=3D 1) return prevsum;
-    return ZSTD_downscaleStats(table, lastEltIndex, ZSTD_highbit32(factor)=
);
+    return ZSTD_downscaleStats(table, lastEltIndex, ZSTD_highbit32(factor)=
, base_1guaranteed);
 }
=20
 /* ZSTD_rescaleFreqs() :
@@ -129,18 +147,22 @@ ZSTD_rescaleFreqs(optState_t* const optPtr,
     DEBUGLOG(5, "ZSTD_rescaleFreqs (srcSize=3D%u)", (unsigned)srcSize);
     optPtr->priceType =3D zop_dynamic;
=20
-    if (optPtr->litLengthSum =3D=3D 0) {  /* first block : init */
-        if (srcSize <=3D ZSTD_PREDEF_THRESHOLD) {  /* heuristic */
-            DEBUGLOG(5, "(srcSize <=3D ZSTD_PREDEF_THRESHOLD) =3D> zop_pre=
def");
+    if (optPtr->litLengthSum =3D=3D 0) {  /* no literals stats collected -=
> first block assumed -> init */
+
+        /* heuristic: use pre-defined stats for too small inputs */
+        if (srcSize <=3D ZSTD_PREDEF_THRESHOLD) {
+            DEBUGLOG(5, "srcSize <=3D %i : use predefined stats", ZSTD_PRE=
DEF_THRESHOLD);
             optPtr->priceType =3D zop_predef;
         }
=20
         assert(optPtr->symbolCosts !=3D NULL);
         if (optPtr->symbolCosts->huf.repeatMode =3D=3D HUF_repeat_valid) {
-            /* huffman table presumed generated by dictionary */
+
+            /* huffman stats covering the full value set : table presumed =
generated by dictionary */
             optPtr->priceType =3D zop_dynamic;
=20
             if (compressedLiterals) {
+                /* generate literals statistics from huffman table */
                 unsigned lit;
                 assert(optPtr->litFreq !=3D NULL);
                 optPtr->litSum =3D 0;
@@ -188,13 +210,14 @@ ZSTD_rescaleFreqs(optState_t* const optPtr,
                     optPtr->offCodeSum +=3D optPtr->offCodeFreq[of];
             }   }
=20
-        } else {  /* not a dictionary */
+        } else {  /* first block, no dictionary */
=20
             assert(optPtr->litFreq !=3D NULL);
             if (compressedLiterals) {
+                /* base initial cost of literals on direct frequency withi=
n src */
                 unsigned lit =3D MaxLit;
                 HIST_count_simple(optPtr->litFreq, &lit, src, srcSize);   =
/* use raw first block to init statistics */
-                optPtr->litSum =3D ZSTD_downscaleStats(optPtr->litFreq, Ma=
xLit, 8);
+                optPtr->litSum =3D ZSTD_downscaleStats(optPtr->litFreq, Ma=
xLit, 8, base_0possible);
             }
=20
             {   unsigned const baseLLfreqs[MaxLL+1] =3D {
@@ -224,10 +247,9 @@ ZSTD_rescaleFreqs(optState_t* const optPtr,
                 optPtr->offCodeSum =3D sum_u32(baseOFCfreqs, MaxOff+1);
             }
=20
-
         }
=20
-    } else {   /* new block : re-use previous statistics, scaled down */
+    } else {   /* new block : scale down accumulated statistics */
=20
         if (compressedLiterals)
             optPtr->litSum =3D ZSTD_scaleStats(optPtr->litFreq, MaxLit, 12=
);
@@ -246,6 +268,7 @@ static U32 ZSTD_rawLiteralsCost(const BYTE* const liter=
als, U32 const litLength,
                                 const optState_t* const optPtr,
                                 int optLevel)
 {
+    DEBUGLOG(8, "ZSTD_rawLiteralsCost (%u literals)", litLength);
     if (litLength =3D=3D 0) return 0;
=20
     if (!ZSTD_compressedLiterals(optPtr))
@@ -255,11 +278,14 @@ static U32 ZSTD_rawLiteralsCost(const BYTE* const lit=
erals, U32 const litLength,
         return (litLength*6) * BITCOST_MULTIPLIER;  /* 6 bit per literal -=
 no statistic used */
=20
     /* dynamic statistics */
-    {   U32 price =3D litLength * optPtr->litSumBasePrice;
+    {   U32 price =3D optPtr->litSumBasePrice * litLength;
+        U32 const litPriceMax =3D optPtr->litSumBasePrice - BITCOST_MULTIP=
LIER;
         U32 u;
+        assert(optPtr->litSumBasePrice >=3D BITCOST_MULTIPLIER);
         for (u=3D0; u < litLength; u++) {
-            assert(WEIGHT(optPtr->litFreq[literals[u]], optLevel) <=3D opt=
Ptr->litSumBasePrice);   /* literal cost should never be negative */
-            price -=3D WEIGHT(optPtr->litFreq[literals[u]], optLevel);
+            U32 litPrice =3D WEIGHT(optPtr->litFreq[literals[u]], optLevel=
);
+            if (UNLIKELY(litPrice > litPriceMax)) litPrice =3D litPriceMax;
+            price -=3D litPrice;
         }
         return price;
     }
@@ -272,10 +298,11 @@ static U32 ZSTD_litLengthPrice(U32 const litLength, c=
onst optState_t* const optP
     assert(litLength <=3D ZSTD_BLOCKSIZE_MAX);
     if (optPtr->priceType =3D=3D zop_predef)
         return WEIGHT(litLength, optLevel);
-    /* We can't compute the litLength price for sizes >=3D ZSTD_BLOCKSIZE_=
MAX
-     * because it isn't representable in the zstd format. So instead just
-     * call it 1 bit more than ZSTD_BLOCKSIZE_MAX - 1. In this case the bl=
ock
-     * would be all literals.
+
+    /* ZSTD_LLcode() can't compute litLength price for sizes >=3D ZSTD_BLO=
CKSIZE_MAX
+     * because it isn't representable in the zstd format.
+     * So instead just pretend it would cost 1 bit more than ZSTD_BLOCKSIZ=
E_MAX - 1.
+     * In such a case, the block would be all literals.
      */
     if (litLength =3D=3D ZSTD_BLOCKSIZE_MAX)
         return BITCOST_MULTIPLIER + ZSTD_litLengthPrice(ZSTD_BLOCKSIZE_MAX=
 - 1, optPtr, optLevel);
@@ -289,24 +316,25 @@ static U32 ZSTD_litLengthPrice(U32 const litLength, c=
onst optState_t* const optP
 }
=20
 /* ZSTD_getMatchPrice() :
- * Provides the cost of the match part (offset + matchLength) of a sequence
+ * Provides the cost of the match part (offset + matchLength) of a sequenc=
e.
  * Must be combined with ZSTD_fullLiteralsCost() to get the full cost of a=
 sequence.
- * @offcode : expects a scale where 0,1,2 are repcodes 1-3, and 3+ are rea=
l_offsets+2
+ * @offBase : sumtype, representing an offset or a repcode, and using nume=
ric representation of ZSTD_storeSeq()
  * @optLevel: when <2, favors small offset for decompression speed (improv=
ed cache efficiency)
  */
 FORCE_INLINE_TEMPLATE U32
-ZSTD_getMatchPrice(U32 const offcode,
+ZSTD_getMatchPrice(U32 const offBase,
                    U32 const matchLength,
              const optState_t* const optPtr,
                    int const optLevel)
 {
     U32 price;
-    U32 const offCode =3D ZSTD_highbit32(STORED_TO_OFFBASE(offcode));
+    U32 const offCode =3D ZSTD_highbit32(offBase);
     U32 const mlBase =3D matchLength - MINMATCH;
     assert(matchLength >=3D MINMATCH);
=20
-    if (optPtr->priceType =3D=3D zop_predef)  /* fixed scheme, do not use =
statistics */
-        return WEIGHT(mlBase, optLevel) + ((16 + offCode) * BITCOST_MULTIP=
LIER);
+    if (optPtr->priceType =3D=3D zop_predef)  /* fixed scheme, does not us=
e statistics */
+        return WEIGHT(mlBase, optLevel)
+             + ((16 + offCode) * BITCOST_MULTIPLIER); /* emulated offset c=
ost */
=20
     /* dynamic statistics */
     price =3D (offCode * BITCOST_MULTIPLIER) + (optPtr->offCodeSumBasePric=
e - WEIGHT(optPtr->offCodeFreq[offCode], optLevel));
@@ -325,10 +353,10 @@ ZSTD_getMatchPrice(U32 const offcode,
 }
=20
 /* ZSTD_updateStats() :
- * assumption : literals + litLengtn <=3D iend */
+ * assumption : literals + litLength <=3D iend */
 static void ZSTD_updateStats(optState_t* const optPtr,
                              U32 litLength, const BYTE* literals,
-                             U32 offsetCode, U32 matchLength)
+                             U32 offBase, U32 matchLength)
 {
     /* literals */
     if (ZSTD_compressedLiterals(optPtr)) {
@@ -344,8 +372,8 @@ static void ZSTD_updateStats(optState_t* const optPtr,
         optPtr->litLengthSum++;
     }
=20
-    /* offset code : expected to follow storeSeq() numeric representation =
*/
-    {   U32 const offCode =3D ZSTD_highbit32(STORED_TO_OFFBASE(offsetCode)=
);
+    /* offset code : follows storeSeq() numeric representation */
+    {   U32 const offCode =3D ZSTD_highbit32(offBase);
         assert(offCode <=3D MaxOff);
         optPtr->offCodeFreq[offCode]++;
         optPtr->offCodeSum++;
@@ -379,9 +407,11 @@ MEM_STATIC U32 ZSTD_readMINMATCH(const void* memPtr, U=
32 length)
=20
 /* Update hashTable3 up to ip (excluded)
    Assumption : always within prefix (i.e. not within extDict) */
-static U32 ZSTD_insertAndFindFirstIndexHash3 (const ZSTD_matchState_t* ms,
-                                              U32* nextToUpdate3,
-                                              const BYTE* const ip)
+static
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
+U32 ZSTD_insertAndFindFirstIndexHash3 (const ZSTD_MatchState_t* ms,
+                                       U32* nextToUpdate3,
+                                       const BYTE* const ip)
 {
     U32* const hashTable3 =3D ms->hashTable3;
     U32 const hashLog3 =3D ms->hashLog3;
@@ -408,8 +438,10 @@ static U32 ZSTD_insertAndFindFirstIndexHash3 (const ZS=
TD_matchState_t* ms,
  * @param ip assumed <=3D iend-8 .
  * @param target The target of ZSTD_updateTree_internal() - we are filling=
 to this position
  * @return : nb of positions added */
-static U32 ZSTD_insertBt1(
-                const ZSTD_matchState_t* ms,
+static
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
+U32 ZSTD_insertBt1(
+                const ZSTD_MatchState_t* ms,
                 const BYTE* const ip, const BYTE* const iend,
                 U32 const target,
                 U32 const mls, const int extDict)
@@ -527,15 +559,16 @@ static U32 ZSTD_insertBt1(
 }
=20
 FORCE_INLINE_TEMPLATE
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
 void ZSTD_updateTree_internal(
-                ZSTD_matchState_t* ms,
+                ZSTD_MatchState_t* ms,
                 const BYTE* const ip, const BYTE* const iend,
                 const U32 mls, const ZSTD_dictMode_e dictMode)
 {
     const BYTE* const base =3D ms->window.base;
     U32 const target =3D (U32)(ip - base);
     U32 idx =3D ms->nextToUpdate;
-    DEBUGLOG(6, "ZSTD_updateTree_internal, from %u to %u  (dictMode:%u)",
+    DEBUGLOG(7, "ZSTD_updateTree_internal, from %u to %u  (dictMode:%u)",
                 idx, target, dictMode);
=20
     while(idx < target) {
@@ -548,20 +581,23 @@ void ZSTD_updateTree_internal(
     ms->nextToUpdate =3D target;
 }
=20
-void ZSTD_updateTree(ZSTD_matchState_t* ms, const BYTE* ip, const BYTE* ie=
nd) {
+void ZSTD_updateTree(ZSTD_MatchState_t* ms, const BYTE* ip, const BYTE* ie=
nd) {
     ZSTD_updateTree_internal(ms, ip, iend, ms->cParams.minMatch, ZSTD_noDi=
ct);
 }
=20
 FORCE_INLINE_TEMPLATE
-U32 ZSTD_insertBtAndGetAllMatches (
-                    ZSTD_match_t* matches,   /* store result (found matche=
s) in this table (presumed large enough) */
-                    ZSTD_matchState_t* ms,
-                    U32* nextToUpdate3,
-                    const BYTE* const ip, const BYTE* const iLimit, const =
ZSTD_dictMode_e dictMode,
-                    const U32 rep[ZSTD_REP_NUM],
-                    U32 const ll0,   /* tells if associated literal length=
 is 0 or not. This value must be 0 or 1 */
-                    const U32 lengthToBeat,
-                    U32 const mls /* template */)
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
+U32
+ZSTD_insertBtAndGetAllMatches (
+                ZSTD_match_t* matches,  /* store result (found matches) in=
 this table (presumed large enough) */
+                ZSTD_MatchState_t* ms,
+                U32* nextToUpdate3,
+                const BYTE* const ip, const BYTE* const iLimit,
+                const ZSTD_dictMode_e dictMode,
+                const U32 rep[ZSTD_REP_NUM],
+                const U32 ll0,  /* tells if associated literal length is 0=
 or not. This value must be 0 or 1 */
+                const U32 lengthToBeat,
+                const U32 mls /* template */)
 {
     const ZSTD_compressionParameters* const cParams =3D &ms->cParams;
     U32 const sufficient_len =3D MIN(cParams->targetLength, ZSTD_OPT_NUM -=
1);
@@ -590,7 +626,7 @@ U32 ZSTD_insertBtAndGetAllMatches (
     U32 mnum =3D 0;
     U32 nbCompares =3D 1U << cParams->searchLog;
=20
-    const ZSTD_matchState_t* dms    =3D dictMode =3D=3D ZSTD_dictMatchStat=
e ? ms->dictMatchState : NULL;
+    const ZSTD_MatchState_t* dms    =3D dictMode =3D=3D ZSTD_dictMatchStat=
e ? ms->dictMatchState : NULL;
     const ZSTD_compressionParameters* const dmsCParams =3D
                                       dictMode =3D=3D ZSTD_dictMatchState =
? &dms->cParams : NULL;
     const BYTE* const dmsBase       =3D dictMode =3D=3D ZSTD_dictMatchStat=
e ? dms->window.base : NULL;
@@ -629,13 +665,13 @@ U32 ZSTD_insertBtAndGetAllMatches (
                 assert(curr >=3D windowLow);
                 if ( dictMode =3D=3D ZSTD_extDict
                   && ( ((repOffset-1) /*intentional overflow*/ < curr - wi=
ndowLow)  /* equivalent to `curr > repIndex >=3D windowLow` */
-                     & (((U32)((dictLimit-1) - repIndex) >=3D 3) ) /* inte=
ntional overflow : do not test positions overlapping 2 memory segments */)
+                     & (ZSTD_index_overlap_check(dictLimit, repIndex)) )
                   && (ZSTD_readMINMATCH(ip, minMatch) =3D=3D ZSTD_readMINM=
ATCH(repMatch, minMatch)) ) {
                     repLen =3D (U32)ZSTD_count_2segments(ip+minMatch, repM=
atch+minMatch, iLimit, dictEnd, prefixStart) + minMatch;
                 }
                 if (dictMode =3D=3D ZSTD_dictMatchState
                   && ( ((repOffset-1) /*intentional overflow*/ < curr - (d=
msLowLimit + dmsIndexDelta))  /* equivalent to `curr > repIndex >=3D dmsLow=
Limit` */
-                     & ((U32)((dictLimit-1) - repIndex) >=3D 3) ) /* inten=
tional overflow : do not test positions overlapping 2 memory segments */
+                     & (ZSTD_index_overlap_check(dictLimit, repIndex)) )
                   && (ZSTD_readMINMATCH(ip, minMatch) =3D=3D ZSTD_readMINM=
ATCH(repMatch, minMatch)) ) {
                     repLen =3D (U32)ZSTD_count_2segments(ip+minMatch, repM=
atch+minMatch, iLimit, dmsEnd, prefixStart) + minMatch;
             }   }
@@ -644,7 +680,7 @@ U32 ZSTD_insertBtAndGetAllMatches (
                 DEBUGLOG(8, "found repCode %u (ll0:%u, offset:%u) of lengt=
h %u",
                             repCode, ll0, repOffset, repLen);
                 bestLength =3D repLen;
-                matches[mnum].off =3D STORE_REPCODE(repCode - ll0 + 1);  /=
* expect value between 1 and 3 */
+                matches[mnum].off =3D REPCODE_TO_OFFBASE(repCode - ll0 + 1=
);  /* expect value between 1 and 3 */
                 matches[mnum].len =3D (U32)repLen;
                 mnum++;
                 if ( (repLen > sufficient_len)
@@ -673,7 +709,7 @@ U32 ZSTD_insertBtAndGetAllMatches (
                 bestLength =3D mlen;
                 assert(curr > matchIndex3);
                 assert(mnum=3D=3D0);  /* no prior solution */
-                matches[0].off =3D STORE_OFFSET(curr - matchIndex3);
+                matches[0].off =3D OFFSET_TO_OFFBASE(curr - matchIndex3);
                 matches[0].len =3D (U32)mlen;
                 mnum =3D 1;
                 if ( (mlen > sufficient_len) |
@@ -706,13 +742,13 @@ U32 ZSTD_insertBtAndGetAllMatches (
         }
=20
         if (matchLength > bestLength) {
-            DEBUGLOG(8, "found match of length %u at distance %u (offCode=
=3D%u)",
-                    (U32)matchLength, curr - matchIndex, STORE_OFFSET(curr=
 - matchIndex));
+            DEBUGLOG(8, "found match of length %u at distance %u (offBase=
=3D%u)",
+                    (U32)matchLength, curr - matchIndex, OFFSET_TO_OFFBASE=
(curr - matchIndex));
             assert(matchEndIdx > matchIndex);
             if (matchLength > matchEndIdx - matchIndex)
                 matchEndIdx =3D matchIndex + (U32)matchLength;
             bestLength =3D matchLength;
-            matches[mnum].off =3D STORE_OFFSET(curr - matchIndex);
+            matches[mnum].off =3D OFFSET_TO_OFFBASE(curr - matchIndex);
             matches[mnum].len =3D (U32)matchLength;
             mnum++;
             if ( (matchLength > ZSTD_OPT_NUM)
@@ -754,12 +790,12 @@ U32 ZSTD_insertBtAndGetAllMatches (
=20
             if (matchLength > bestLength) {
                 matchIndex =3D dictMatchIndex + dmsIndexDelta;
-                DEBUGLOG(8, "found dms match of length %u at distance %u (=
offCode=3D%u)",
-                        (U32)matchLength, curr - matchIndex, STORE_OFFSET(=
curr - matchIndex));
+                DEBUGLOG(8, "found dms match of length %u at distance %u (=
offBase=3D%u)",
+                        (U32)matchLength, curr - matchIndex, OFFSET_TO_OFF=
BASE(curr - matchIndex));
                 if (matchLength > matchEndIdx - matchIndex)
                     matchEndIdx =3D matchIndex + (U32)matchLength;
                 bestLength =3D matchLength;
-                matches[mnum].off =3D STORE_OFFSET(curr - matchIndex);
+                matches[mnum].off =3D OFFSET_TO_OFFBASE(curr - matchIndex);
                 matches[mnum].len =3D (U32)matchLength;
                 mnum++;
                 if ( (matchLength > ZSTD_OPT_NUM)
@@ -784,7 +820,7 @@ U32 ZSTD_insertBtAndGetAllMatches (
=20
 typedef U32 (*ZSTD_getAllMatchesFn)(
     ZSTD_match_t*,
-    ZSTD_matchState_t*,
+    ZSTD_MatchState_t*,
     U32*,
     const BYTE*,
     const BYTE*,
@@ -792,9 +828,11 @@ typedef U32 (*ZSTD_getAllMatchesFn)(
     U32 const ll0,
     U32 const lengthToBeat);
=20
-FORCE_INLINE_TEMPLATE U32 ZSTD_btGetAllMatches_internal(
+FORCE_INLINE_TEMPLATE
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
+U32 ZSTD_btGetAllMatches_internal(
         ZSTD_match_t* matches,
-        ZSTD_matchState_t* ms,
+        ZSTD_MatchState_t* ms,
         U32* nextToUpdate3,
         const BYTE* ip,
         const BYTE* const iHighLimit,
@@ -817,7 +855,7 @@ FORCE_INLINE_TEMPLATE U32 ZSTD_btGetAllMatches_internal(
 #define GEN_ZSTD_BT_GET_ALL_MATCHES_(dictMode, mls)            \
     static U32 ZSTD_BT_GET_ALL_MATCHES_FN(dictMode, mls)(      \
             ZSTD_match_t* matches,                             \
-            ZSTD_matchState_t* ms,                             \
+            ZSTD_MatchState_t* ms,                             \
             U32* nextToUpdate3,                                \
             const BYTE* ip,                                    \
             const BYTE* const iHighLimit,                      \
@@ -849,7 +887,7 @@ GEN_ZSTD_BT_GET_ALL_MATCHES(dictMatchState)
     }
=20
 static ZSTD_getAllMatchesFn
-ZSTD_selectBtGetAllMatches(ZSTD_matchState_t const* ms, ZSTD_dictMode_e co=
nst dictMode)
+ZSTD_selectBtGetAllMatches(ZSTD_MatchState_t const* ms, ZSTD_dictMode_e co=
nst dictMode)
 {
     ZSTD_getAllMatchesFn const getAllMatchesFns[3][4] =3D {
         ZSTD_BT_GET_ALL_MATCHES_ARRAY(noDict),
@@ -868,7 +906,7 @@ ZSTD_selectBtGetAllMatches(ZSTD_matchState_t const* ms,=
 ZSTD_dictMode_e const di
=20
 /* Struct containing info needed to make decision about ldm inclusion */
 typedef struct {
-    rawSeqStore_t seqStore;   /* External match candidates store for this =
block */
+    RawSeqStore_t seqStore;   /* External match candidates store for this =
block */
     U32 startPosInBlock;      /* Start position of the current match candi=
date */
     U32 endPosInBlock;        /* End position of the current match candida=
te */
     U32 offset;               /* Offset of the match candidate */
@@ -878,7 +916,7 @@ typedef struct {
  * Moves forward in @rawSeqStore by @nbBytes,
  * which will update the fields 'pos' and 'posInSequence'.
  */
-static void ZSTD_optLdm_skipRawSeqStoreBytes(rawSeqStore_t* rawSeqStore, s=
ize_t nbBytes)
+static void ZSTD_optLdm_skipRawSeqStoreBytes(RawSeqStore_t* rawSeqStore, s=
ize_t nbBytes)
 {
     U32 currPos =3D (U32)(rawSeqStore->posInSequence + nbBytes);
     while (currPos && rawSeqStore->pos < rawSeqStore->size) {
@@ -935,7 +973,7 @@ ZSTD_opt_getNextMatchAndUpdateSeqStore(ZSTD_optLdm_t* o=
ptLdm, U32 currPosInBlock
         return;
     }
=20
-    /* Matches may be < MINMATCH by this process. In that case, we will re=
ject them
+    /* Matches may be < minMatch by this process. In that case, we will re=
ject them
        when we are deciding whether or not to add the ldm */
     optLdm->startPosInBlock =3D currPosInBlock + literalsBytesRemaining;
     optLdm->endPosInBlock =3D optLdm->startPosInBlock + matchBytesRemainin=
g;
@@ -957,25 +995,26 @@ ZSTD_opt_getNextMatchAndUpdateSeqStore(ZSTD_optLdm_t*=
 optLdm, U32 currPosInBlock
  * into 'matches'. Maintains the correct ordering of 'matches'.
  */
 static void ZSTD_optLdm_maybeAddMatch(ZSTD_match_t* matches, U32* nbMatche=
s,
-                                      const ZSTD_optLdm_t* optLdm, U32 cur=
rPosInBlock)
+                                      const ZSTD_optLdm_t* optLdm, U32 cur=
rPosInBlock,
+                                      U32 minMatch)
 {
     U32 const posDiff =3D currPosInBlock - optLdm->startPosInBlock;
-    /* Note: ZSTD_match_t actually contains offCode and matchLength (befor=
e subtracting MINMATCH) */
+    /* Note: ZSTD_match_t actually contains offBase and matchLength (befor=
e subtracting MINMATCH) */
     U32 const candidateMatchLength =3D optLdm->endPosInBlock - optLdm->sta=
rtPosInBlock - posDiff;
=20
     /* Ensure that current block position is not outside of the match */
     if (currPosInBlock < optLdm->startPosInBlock
       || currPosInBlock >=3D optLdm->endPosInBlock
-      || candidateMatchLength < MINMATCH) {
+      || candidateMatchLength < minMatch) {
         return;
     }
=20
     if (*nbMatches =3D=3D 0 || ((candidateMatchLength > matches[*nbMatches=
-1].len) && *nbMatches < ZSTD_OPT_NUM)) {
-        U32 const candidateOffCode =3D STORE_OFFSET(optLdm->offset);
-        DEBUGLOG(6, "ZSTD_optLdm_maybeAddMatch(): Adding ldm candidate mat=
ch (offCode: %u matchLength %u) at block position=3D%u",
-                 candidateOffCode, candidateMatchLength, currPosInBlock);
+        U32 const candidateOffBase =3D OFFSET_TO_OFFBASE(optLdm->offset);
+        DEBUGLOG(6, "ZSTD_optLdm_maybeAddMatch(): Adding ldm candidate mat=
ch (offBase: %u matchLength %u) at block position=3D%u",
+                 candidateOffBase, candidateMatchLength, currPosInBlock);
         matches[*nbMatches].len =3D candidateMatchLength;
-        matches[*nbMatches].off =3D candidateOffCode;
+        matches[*nbMatches].off =3D candidateOffBase;
         (*nbMatches)++;
     }
 }
@@ -986,7 +1025,8 @@ static void ZSTD_optLdm_maybeAddMatch(ZSTD_match_t* ma=
tches, U32* nbMatches,
 static void
 ZSTD_optLdm_processMatchCandidate(ZSTD_optLdm_t* optLdm,
                                   ZSTD_match_t* matches, U32* nbMatches,
-                                  U32 currPosInBlock, U32 remainingBytes)
+                                  U32 currPosInBlock, U32 remainingBytes,
+                                  U32 minMatch)
 {
     if (optLdm->seqStore.size =3D=3D 0 || optLdm->seqStore.pos >=3D optLdm=
->seqStore.size) {
         return;
@@ -1003,7 +1043,7 @@ ZSTD_optLdm_processMatchCandidate(ZSTD_optLdm_t* optL=
dm,
         }
         ZSTD_opt_getNextMatchAndUpdateSeqStore(optLdm, currPosInBlock, rem=
ainingBytes);
     }
-    ZSTD_optLdm_maybeAddMatch(matches, nbMatches, optLdm, currPosInBlock);
+    ZSTD_optLdm_maybeAddMatch(matches, nbMatches, optLdm, currPosInBlock, =
minMatch);
 }
=20
=20
@@ -1011,11 +1051,6 @@ ZSTD_optLdm_processMatchCandidate(ZSTD_optLdm_t* opt=
Ldm,
 *  Optimal parser
 *********************************/
=20
-static U32 ZSTD_totalLen(ZSTD_optimal_t sol)
-{
-    return sol.litlen + sol.mlen;
-}
-
 #if 0 /* debug */
=20
 static void
@@ -1033,9 +1068,15 @@ listStats(const U32* table, int lastEltID)
=20
 #endif
=20
-FORCE_INLINE_TEMPLATE size_t
-ZSTD_compressBlock_opt_generic(ZSTD_matchState_t* ms,
-                               seqStore_t* seqStore,
+#define LIT_PRICE(_p) (int)ZSTD_rawLiteralsCost(_p, 1, optStatePtr, optLev=
el)
+#define LL_PRICE(_l) (int)ZSTD_litLengthPrice(_l, optStatePtr, optLevel)
+#define LL_INCPRICE(_l) (LL_PRICE(_l) - LL_PRICE(_l-1))
+
+FORCE_INLINE_TEMPLATE
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
+size_t
+ZSTD_compressBlock_opt_generic(ZSTD_MatchState_t* ms,
+                               SeqStore_t* seqStore,
                                U32 rep[ZSTD_REP_NUM],
                          const void* src, size_t srcSize,
                          const int optLevel,
@@ -1059,9 +1100,11 @@ ZSTD_compressBlock_opt_generic(ZSTD_matchState_t* ms,
=20
     ZSTD_optimal_t* const opt =3D optStatePtr->priceTable;
     ZSTD_match_t* const matches =3D optStatePtr->matchTable;
-    ZSTD_optimal_t lastSequence;
+    ZSTD_optimal_t lastStretch;
     ZSTD_optLdm_t optLdm;
=20
+    ZSTD_memset(&lastStretch, 0, sizeof(ZSTD_optimal_t));
+
     optLdm.seqStore =3D ms->ldmSeqStore ? *ms->ldmSeqStore : kNullRawSeqSt=
ore;
     optLdm.endPosInBlock =3D optLdm.startPosInBlock =3D optLdm.offset =3D =
0;
     ZSTD_opt_getNextMatchAndUpdateSeqStore(&optLdm, (U32)(ip-istart), (U32=
)(iend-ip));
@@ -1082,103 +1125,140 @@ ZSTD_compressBlock_opt_generic(ZSTD_matchState_t*=
 ms,
             U32 const ll0 =3D !litlen;
             U32 nbMatches =3D getAllMatches(matches, ms, &nextToUpdate3, i=
p, iend, rep, ll0, minMatch);
             ZSTD_optLdm_processMatchCandidate(&optLdm, matches, &nbMatches,
-                                              (U32)(ip-istart), (U32)(iend=
 - ip));
-            if (!nbMatches) { ip++; continue; }
+                                              (U32)(ip-istart), (U32)(iend=
-ip),
+                                              minMatch);
+            if (!nbMatches) {
+                DEBUGLOG(8, "no match found at cPos %u", (unsigned)(ip-ist=
art));
+                ip++;
+                continue;
+            }
+
+            /* Match found: let's store this solution, and eventually find=
 more candidates.
+             * During this forward pass, @opt is used to store stretches,
+             * defined as "a match followed by N literals".
+             * Note how this is different from a Sequence, which is "N lit=
erals followed by a match".
+             * Storing stretches allows us to store different match predec=
essors
+             * for each literal position part of a literals run. */
=20
             /* initialize opt[0] */
-            { U32 i ; for (i=3D0; i<ZSTD_REP_NUM; i++) opt[0].rep[i] =3D r=
ep[i]; }
-            opt[0].mlen =3D 0;  /* means is_a_literal */
+            opt[0].mlen =3D 0;  /* there are only literals so far */
             opt[0].litlen =3D litlen;
-            /* We don't need to include the actual price of the literals b=
ecause
-             * it is static for the duration of the forward pass, and is i=
ncluded
-             * in every price. We include the literal length to avoid nega=
tive
-             * prices when we subtract the previous literal length.
+            /* No need to include the actual price of the literals before =
the first match
+             * because it is static for the duration of the forward pass, =
and is included
+             * in every subsequent price. But, we include the literal leng=
th because
+             * the cost variation of litlen depends on the value of litlen.
              */
-            opt[0].price =3D (int)ZSTD_litLengthPrice(litlen, optStatePtr,=
 optLevel);
+            opt[0].price =3D LL_PRICE(litlen);
+            ZSTD_STATIC_ASSERT(sizeof(opt[0].rep[0]) =3D=3D sizeof(rep[0])=
);
+            ZSTD_memcpy(&opt[0].rep, rep, sizeof(opt[0].rep));
=20
             /* large match -> immediate encoding */
             {   U32 const maxML =3D matches[nbMatches-1].len;
-                U32 const maxOffcode =3D matches[nbMatches-1].off;
-                DEBUGLOG(6, "found %u matches of maxLength=3D%u and maxOff=
Code=3D%u at cPos=3D%u =3D> start new series",
-                            nbMatches, maxML, maxOffcode, (U32)(ip-prefixS=
tart));
+                U32 const maxOffBase =3D matches[nbMatches-1].off;
+                DEBUGLOG(6, "found %u matches of maxLength=3D%u and maxOff=
Base=3D%u at cPos=3D%u =3D> start new series",
+                            nbMatches, maxML, maxOffBase, (U32)(ip-prefixS=
tart));
=20
                 if (maxML > sufficient_len) {
-                    lastSequence.litlen =3D litlen;
-                    lastSequence.mlen =3D maxML;
-                    lastSequence.off =3D maxOffcode;
-                    DEBUGLOG(6, "large match (%u>%u), immediate encoding",
+                    lastStretch.litlen =3D 0;
+                    lastStretch.mlen =3D maxML;
+                    lastStretch.off =3D maxOffBase;
+                    DEBUGLOG(6, "large match (%u>%u) =3D> immediate encodi=
ng",
                                 maxML, sufficient_len);
                     cur =3D 0;
-                    last_pos =3D ZSTD_totalLen(lastSequence);
+                    last_pos =3D maxML;
                     goto _shortestPath;
             }   }
=20
             /* set prices for first matches starting position =3D=3D 0 */
             assert(opt[0].price >=3D 0);
-            {   U32 const literalsPrice =3D (U32)opt[0].price + ZSTD_litLe=
ngthPrice(0, optStatePtr, optLevel);
-                U32 pos;
+            {   U32 pos;
                 U32 matchNb;
                 for (pos =3D 1; pos < minMatch; pos++) {
-                    opt[pos].price =3D ZSTD_MAX_PRICE;   /* mlen, litlen a=
nd price will be fixed during forward scanning */
+                    opt[pos].price =3D ZSTD_MAX_PRICE;
+                    opt[pos].mlen =3D 0;
+                    opt[pos].litlen =3D litlen + pos;
                 }
                 for (matchNb =3D 0; matchNb < nbMatches; matchNb++) {
-                    U32 const offcode =3D matches[matchNb].off;
+                    U32 const offBase =3D matches[matchNb].off;
                     U32 const end =3D matches[matchNb].len;
                     for ( ; pos <=3D end ; pos++ ) {
-                        U32 const matchPrice =3D ZSTD_getMatchPrice(offcod=
e, pos, optStatePtr, optLevel);
-                        U32 const sequencePrice =3D literalsPrice + matchP=
rice;
+                        int const matchPrice =3D (int)ZSTD_getMatchPrice(o=
ffBase, pos, optStatePtr, optLevel);
+                        int const sequencePrice =3D opt[0].price + matchPr=
ice;
                         DEBUGLOG(7, "rPos:%u =3D> set initial price : %.2f=
",
                                     pos, ZSTD_fCost(sequencePrice));
                         opt[pos].mlen =3D pos;
-                        opt[pos].off =3D offcode;
-                        opt[pos].litlen =3D litlen;
-                        opt[pos].price =3D (int)sequencePrice;
-                }   }
+                        opt[pos].off =3D offBase;
+                        opt[pos].litlen =3D 0; /* end of match */
+                        opt[pos].price =3D sequencePrice + LL_PRICE(0);
+                    }
+                }
                 last_pos =3D pos-1;
+                opt[pos].price =3D ZSTD_MAX_PRICE;
             }
         }
=20
         /* check further positions */
         for (cur =3D 1; cur <=3D last_pos; cur++) {
             const BYTE* const inr =3D ip + cur;
-            assert(cur < ZSTD_OPT_NUM);
-            DEBUGLOG(7, "cPos:%zi=3D=3DrPos:%u", inr-istart, cur)
+            assert(cur <=3D ZSTD_OPT_NUM);
+            DEBUGLOG(7, "cPos:%i=3D=3DrPos:%u", (int)(inr-istart), cur);
=20
             /* Fix current position with one literal if cheaper */
-            {   U32 const litlen =3D (opt[cur-1].mlen =3D=3D 0) ? opt[cur-=
1].litlen + 1 : 1;
+            {   U32 const litlen =3D opt[cur-1].litlen + 1;
                 int const price =3D opt[cur-1].price
-                                + (int)ZSTD_rawLiteralsCost(ip+cur-1, 1, o=
ptStatePtr, optLevel)
-                                + (int)ZSTD_litLengthPrice(litlen, optStat=
ePtr, optLevel)
-                                - (int)ZSTD_litLengthPrice(litlen-1, optSt=
atePtr, optLevel);
+                                + LIT_PRICE(ip+cur-1)
+                                + LL_INCPRICE(litlen);
                 assert(price < 1000000000); /* overflow check */
                 if (price <=3D opt[cur].price) {
-                    DEBUGLOG(7, "cPos:%zi=3D=3DrPos:%u : better price (%.2=
f<=3D%.2f) using literal (ll=3D=3D%u) (hist:%u,%u,%u)",
-                                inr-istart, cur, ZSTD_fCost(price), ZSTD_f=
Cost(opt[cur].price), litlen,
+                    ZSTD_optimal_t const prevMatch =3D opt[cur];
+                    DEBUGLOG(7, "cPos:%i=3D=3DrPos:%u : better price (%.2f=
<=3D%.2f) using literal (ll=3D=3D%u) (hist:%u,%u,%u)",
+                                (int)(inr-istart), cur, ZSTD_fCost(price),=
 ZSTD_fCost(opt[cur].price), litlen,
                                 opt[cur-1].rep[0], opt[cur-1].rep[1], opt[=
cur-1].rep[2]);
-                    opt[cur].mlen =3D 0;
-                    opt[cur].off =3D 0;
+                    opt[cur] =3D opt[cur-1];
                     opt[cur].litlen =3D litlen;
                     opt[cur].price =3D price;
+                    if ( (optLevel >=3D 1) /* additional check only for hi=
gher modes */
+                      && (prevMatch.litlen =3D=3D 0) /* replace a match */
+                      && (LL_INCPRICE(1) < 0) /* ll1 is cheaper than ll0 */
+                      && LIKELY(ip + cur < iend)
+                    ) {
+                        /* check next position, in case it would be cheape=
r */
+                        int with1literal =3D prevMatch.price + LIT_PRICE(i=
p+cur) + LL_INCPRICE(1);
+                        int withMoreLiterals =3D price + LIT_PRICE(ip+cur)=
 + LL_INCPRICE(litlen+1);
+                        DEBUGLOG(7, "then at next rPos %u : match+1lit %.2=
f vs %ulits %.2f",
+                                cur+1, ZSTD_fCost(with1literal), litlen+1,=
 ZSTD_fCost(withMoreLiterals));
+                        if ( (with1literal < withMoreLiterals)
+                          && (with1literal < opt[cur+1].price) ) {
+                            /* update offset history - before it disappear=
s */
+                            U32 const prev =3D cur - prevMatch.mlen;
+                            Repcodes_t const newReps =3D ZSTD_newRep(opt[p=
rev].rep, prevMatch.off, opt[prev].litlen=3D=3D0);
+                            assert(cur >=3D prevMatch.mlen);
+                            DEBUGLOG(7, "=3D=3D> match+1lit is cheaper (%.=
2f < %.2f) (hist:%u,%u,%u) !",
+                                        ZSTD_fCost(with1literal), ZSTD_fCo=
st(withMoreLiterals),
+                                        newReps.rep[0], newReps.rep[1], ne=
wReps.rep[2] );
+                            opt[cur+1] =3D prevMatch;  /* mlen & offbase */
+                            ZSTD_memcpy(opt[cur+1].rep, &newReps, sizeof(R=
epcodes_t));
+                            opt[cur+1].litlen =3D 1;
+                            opt[cur+1].price =3D with1literal;
+                            if (last_pos < cur+1) last_pos =3D cur+1;
+                        }
+                    }
                 } else {
-                    DEBUGLOG(7, "cPos:%zi=3D=3DrPos:%u : literal would cos=
t more (%.2f>%.2f) (hist:%u,%u,%u)",
-                                inr-istart, cur, ZSTD_fCost(price), ZSTD_f=
Cost(opt[cur].price),
-                                opt[cur].rep[0], opt[cur].rep[1], opt[cur]=
.rep[2]);
+                    DEBUGLOG(7, "cPos:%i=3D=3DrPos:%u : literal would cost=
 more (%.2f>%.2f)",
+                                (int)(inr-istart), cur, ZSTD_fCost(price),=
 ZSTD_fCost(opt[cur].price));
                 }
             }
=20
-            /* Set the repcodes of the current position. We must do it here
-             * because we rely on the repcodes of the 2nd to last sequence=
 being
-             * correct to set the next chunks repcodes during the backward
-             * traversal.
+            /* Offset history is not updated during match comparison.
+             * Do it here, now that the match is selected and confirmed.
              */
-            ZSTD_STATIC_ASSERT(sizeof(opt[cur].rep) =3D=3D sizeof(repcodes=
_t));
+            ZSTD_STATIC_ASSERT(sizeof(opt[cur].rep) =3D=3D sizeof(Repcodes=
_t));
             assert(cur >=3D opt[cur].mlen);
-            if (opt[cur].mlen !=3D 0) {
+            if (opt[cur].litlen =3D=3D 0) {
+                /* just finished a match =3D> alter offset history */
                 U32 const prev =3D cur - opt[cur].mlen;
-                repcodes_t const newReps =3D ZSTD_newRep(opt[prev].rep, op=
t[cur].off, opt[cur].litlen=3D=3D0);
-                ZSTD_memcpy(opt[cur].rep, &newReps, sizeof(repcodes_t));
-            } else {
-                ZSTD_memcpy(opt[cur].rep, opt[cur - 1].rep, sizeof(repcode=
s_t));
+                Repcodes_t const newReps =3D ZSTD_newRep(opt[prev].rep, op=
t[cur].off, opt[prev].litlen=3D=3D0);
+                ZSTD_memcpy(opt[cur].rep, &newReps, sizeof(Repcodes_t));
             }
=20
             /* last match must start at a minimum distance of 8 from oend =
*/
@@ -1188,38 +1268,37 @@ ZSTD_compressBlock_opt_generic(ZSTD_matchState_t* m=
s,
=20
             if ( (optLevel=3D=3D0) /*static_test*/
               && (opt[cur+1].price <=3D opt[cur].price + (BITCOST_MULTIPLI=
ER/2)) ) {
-                DEBUGLOG(7, "move to next rPos:%u : price is <=3D", cur+1);
+                DEBUGLOG(7, "skip current position : next rPos(%u) price i=
s cheaper", cur+1);
                 continue;  /* skip unpromising positions; about ~+6% speed=
, -0.01 ratio */
             }
=20
             assert(opt[cur].price >=3D 0);
-            {   U32 const ll0 =3D (opt[cur].mlen !=3D 0);
-                U32 const litlen =3D (opt[cur].mlen =3D=3D 0) ? opt[cur].l=
itlen : 0;
-                U32 const previousPrice =3D (U32)opt[cur].price;
-                U32 const basePrice =3D previousPrice + ZSTD_litLengthPric=
e(0, optStatePtr, optLevel);
+            {   U32 const ll0 =3D (opt[cur].litlen =3D=3D 0);
+                int const previousPrice =3D opt[cur].price;
+                int const basePrice =3D previousPrice + LL_PRICE(0);
                 U32 nbMatches =3D getAllMatches(matches, ms, &nextToUpdate=
3, inr, iend, opt[cur].rep, ll0, minMatch);
                 U32 matchNb;
=20
                 ZSTD_optLdm_processMatchCandidate(&optLdm, matches, &nbMat=
ches,
-                                                  (U32)(inr-istart), (U32)=
(iend-inr));
+                                                  (U32)(inr-istart), (U32)=
(iend-inr),
+                                                  minMatch);
=20
                 if (!nbMatches) {
                     DEBUGLOG(7, "rPos:%u : no match found", cur);
                     continue;
                 }
=20
-                {   U32 const maxML =3D matches[nbMatches-1].len;
-                    DEBUGLOG(7, "cPos:%zi=3D=3DrPos:%u, found %u matches, =
of maxLength=3D%u",
-                                inr-istart, cur, nbMatches, maxML);
-
-                    if ( (maxML > sufficient_len)
-                      || (cur + maxML >=3D ZSTD_OPT_NUM) ) {
-                        lastSequence.mlen =3D maxML;
-                        lastSequence.off =3D matches[nbMatches-1].off;
-                        lastSequence.litlen =3D litlen;
-                        cur -=3D (opt[cur].mlen=3D=3D0) ? opt[cur].litlen =
: 0;  /* last sequence is actually only literals, fix cur to last match - n=
ote : may underflow, in which case, it's first sequence, and it's okay */
-                        last_pos =3D cur + ZSTD_totalLen(lastSequence);
-                        if (cur > ZSTD_OPT_NUM) cur =3D 0;   /* underflow =
=3D> first match */
+                {   U32 const longestML =3D matches[nbMatches-1].len;
+                    DEBUGLOG(7, "cPos:%i=3D=3DrPos:%u, found %u matches, o=
f longest ML=3D%u",
+                                (int)(inr-istart), cur, nbMatches, longest=
ML);
+
+                    if ( (longestML > sufficient_len)
+                      || (cur + longestML >=3D ZSTD_OPT_NUM)
+                      || (ip + cur + longestML >=3D iend) ) {
+                        lastStretch.mlen =3D longestML;
+                        lastStretch.off =3D matches[nbMatches-1].off;
+                        lastStretch.litlen =3D 0;
+                        last_pos =3D cur + longestML;
                         goto _shortestPath;
                 }   }
=20
@@ -1230,20 +1309,25 @@ ZSTD_compressBlock_opt_generic(ZSTD_matchState_t* m=
s,
                     U32 const startML =3D (matchNb>0) ? matches[matchNb-1]=
.len+1 : minMatch;
                     U32 mlen;
=20
-                    DEBUGLOG(7, "testing match %u =3D> offCode=3D%4u, mlen=
=3D%2u, llen=3D%2u",
-                                matchNb, matches[matchNb].off, lastML, lit=
len);
+                    DEBUGLOG(7, "testing match %u =3D> offBase=3D%4u, mlen=
=3D%2u, llen=3D%2u",
+                                matchNb, matches[matchNb].off, lastML, opt=
[cur].litlen);
=20
                     for (mlen =3D lastML; mlen >=3D startML; mlen--) {  /*=
 scan downward */
                         U32 const pos =3D cur + mlen;
-                        int const price =3D (int)basePrice + (int)ZSTD_get=
MatchPrice(offset, mlen, optStatePtr, optLevel);
+                        int const price =3D basePrice + (int)ZSTD_getMatch=
Price(offset, mlen, optStatePtr, optLevel);
=20
                         if ((pos > last_pos) || (price < opt[pos].price)) {
                             DEBUGLOG(7, "rPos:%u (ml=3D%2u) =3D> new bette=
r price (%.2f<%.2f)",
                                         pos, mlen, ZSTD_fCost(price), ZSTD=
_fCost(opt[pos].price));
-                            while (last_pos < pos) { opt[last_pos+1].price=
 =3D ZSTD_MAX_PRICE; last_pos++; }   /* fill empty positions */
+                            while (last_pos < pos) {
+                                /* fill empty positions, for future compar=
isons */
+                                last_pos++;
+                                opt[last_pos].price =3D ZSTD_MAX_PRICE;
+                                opt[last_pos].litlen =3D !0;  /* just need=
s to be !=3D 0, to mean "not an end of match" */
+                            }
                             opt[pos].mlen =3D mlen;
                             opt[pos].off =3D offset;
-                            opt[pos].litlen =3D litlen;
+                            opt[pos].litlen =3D 0;
                             opt[pos].price =3D price;
                         } else {
                             DEBUGLOG(7, "rPos:%u (ml=3D%2u) =3D> new price=
 is worse (%.2f>=3D%.2f)",
@@ -1251,55 +1335,89 @@ ZSTD_compressBlock_opt_generic(ZSTD_matchState_t* m=
s,
                             if (optLevel=3D=3D0) break;  /* early update a=
bort; gets ~+10% speed for about -0.01 ratio loss */
                         }
             }   }   }
+            opt[last_pos+1].price =3D ZSTD_MAX_PRICE;
         }  /* for (cur =3D 1; cur <=3D last_pos; cur++) */
=20
-        lastSequence =3D opt[last_pos];
-        cur =3D last_pos > ZSTD_totalLen(lastSequence) ? last_pos - ZSTD_t=
otalLen(lastSequence) : 0;  /* single sequence, and it starts before `ip` */
-        assert(cur < ZSTD_OPT_NUM);  /* control overflow*/
+        lastStretch =3D opt[last_pos];
+        assert(cur >=3D lastStretch.mlen);
+        cur =3D last_pos - lastStretch.mlen;
=20
 _shortestPath:   /* cur, last_pos, best_mlen, best_off have to be set */
         assert(opt[0].mlen =3D=3D 0);
+        assert(last_pos >=3D lastStretch.mlen);
+        assert(cur =3D=3D last_pos - lastStretch.mlen);
=20
-        /* Set the next chunk's repcodes based on the repcodes of the begi=
nning
-         * of the last match, and the last sequence. This avoids us having=
 to
-         * update them while traversing the sequences.
-         */
-        if (lastSequence.mlen !=3D 0) {
-            repcodes_t const reps =3D ZSTD_newRep(opt[cur].rep, lastSequen=
ce.off, lastSequence.litlen=3D=3D0);
-            ZSTD_memcpy(rep, &reps, sizeof(reps));
+        if (lastStretch.mlen=3D=3D0) {
+            /* no solution : all matches have been converted into literals=
 */
+            assert(lastStretch.litlen =3D=3D (ip - anchor) + last_pos);
+            ip +=3D last_pos;
+            continue;
+        }
+        assert(lastStretch.off > 0);
+
+        /* Update offset history */
+        if (lastStretch.litlen =3D=3D 0) {
+            /* finishing on a match : update offset history */
+            Repcodes_t const reps =3D ZSTD_newRep(opt[cur].rep, lastStretc=
h.off, opt[cur].litlen=3D=3D0);
+            ZSTD_memcpy(rep, &reps, sizeof(Repcodes_t));
         } else {
-            ZSTD_memcpy(rep, opt[cur].rep, sizeof(repcodes_t));
+            ZSTD_memcpy(rep, lastStretch.rep, sizeof(Repcodes_t));
+            assert(cur >=3D lastStretch.litlen);
+            cur -=3D lastStretch.litlen;
         }
=20
-        {   U32 const storeEnd =3D cur + 1;
+        /* Let's write the shortest path solution.
+         * It is stored in @opt in reverse order,
+         * starting from @storeEnd (=3D=3Dcur+2),
+         * effectively partially @opt overwriting.
+         * Content is changed too:
+         * - So far, @opt stored stretches, aka a match followed by litera=
ls
+         * - Now, it will store sequences, aka literals followed by a match
+         */
+        {   U32 const storeEnd =3D cur + 2;
             U32 storeStart =3D storeEnd;
-            U32 seqPos =3D cur;
+            U32 stretchPos =3D cur;
=20
             DEBUGLOG(6, "start reverse traversal (last_pos:%u, cur:%u)",
                         last_pos, cur); (void)last_pos;
-            assert(storeEnd < ZSTD_OPT_NUM);
-            DEBUGLOG(6, "last sequence copied into pos=3D%u (llen=3D%u,mle=
n=3D%u,ofc=3D%u)",
-                        storeEnd, lastSequence.litlen, lastSequence.mlen, =
lastSequence.off);
-            opt[storeEnd] =3D lastSequence;
-            while (seqPos > 0) {
-                U32 const backDist =3D ZSTD_totalLen(opt[seqPos]);
+            assert(storeEnd < ZSTD_OPT_SIZE);
+            DEBUGLOG(6, "last stretch copied into pos=3D%u (llen=3D%u,mlen=
=3D%u,ofc=3D%u)",
+                        storeEnd, lastStretch.litlen, lastStretch.mlen, la=
stStretch.off);
+            if (lastStretch.litlen > 0) {
+                /* last "sequence" is unfinished: just a bunch of literals=
 */
+                opt[storeEnd].litlen =3D lastStretch.litlen;
+                opt[storeEnd].mlen =3D 0;
+                storeStart =3D storeEnd-1;
+                opt[storeStart] =3D lastStretch;
+            } {
+                opt[storeEnd] =3D lastStretch;  /* note: litlen will be fi=
xed */
+                storeStart =3D storeEnd;
+            }
+            while (1) {
+                ZSTD_optimal_t nextStretch =3D opt[stretchPos];
+                opt[storeStart].litlen =3D nextStretch.litlen;
+                DEBUGLOG(6, "selected sequence (llen=3D%u,mlen=3D%u,ofc=3D=
%u)",
+                            opt[storeStart].litlen, opt[storeStart].mlen, =
opt[storeStart].off);
+                if (nextStretch.mlen =3D=3D 0) {
+                    /* reaching beginning of segment */
+                    break;
+                }
                 storeStart--;
-                DEBUGLOG(6, "sequence from rPos=3D%u copied into pos=3D%u =
(llen=3D%u,mlen=3D%u,ofc=3D%u)",
-                            seqPos, storeStart, opt[seqPos].litlen, opt[se=
qPos].mlen, opt[seqPos].off);
-                opt[storeStart] =3D opt[seqPos];
-                seqPos =3D (seqPos > backDist) ? seqPos - backDist : 0;
+                opt[storeStart] =3D nextStretch; /* note: litlen will be f=
ixed */
+                assert(nextStretch.litlen + nextStretch.mlen <=3D stretchP=
os);
+                stretchPos -=3D nextStretch.litlen + nextStretch.mlen;
             }
=20
             /* save sequences */
-            DEBUGLOG(6, "sending selected sequences into seqStore")
+            DEBUGLOG(6, "sending selected sequences into seqStore");
             {   U32 storePos;
                 for (storePos=3DstoreStart; storePos <=3D storeEnd; storeP=
os++) {
                     U32 const llen =3D opt[storePos].litlen;
                     U32 const mlen =3D opt[storePos].mlen;
-                    U32 const offCode =3D opt[storePos].off;
+                    U32 const offBase =3D opt[storePos].off;
                     U32 const advance =3D llen + mlen;
-                    DEBUGLOG(6, "considering seq starting at %zi, llen=3D%=
u, mlen=3D%u",
-                                anchor - istart, (unsigned)llen, (unsigned=
)mlen);
+                    DEBUGLOG(6, "considering seq starting at %i, llen=3D%u=
, mlen=3D%u",
+                                (int)(anchor - istart), (unsigned)llen, (u=
nsigned)mlen);
=20
                     if (mlen=3D=3D0) {  /* only literals =3D> must be last=
 "sequence", actually starting a new stream of sequences */
                         assert(storePos =3D=3D storeEnd);   /* must be las=
t sequence */
@@ -1308,11 +1426,14 @@ ZSTD_compressBlock_opt_generic(ZSTD_matchState_t* m=
s,
                     }
=20
                     assert(anchor + llen <=3D iend);
-                    ZSTD_updateStats(optStatePtr, llen, anchor, offCode, m=
len);
-                    ZSTD_storeSeq(seqStore, llen, anchor, iend, offCode, m=
len);
+                    ZSTD_updateStats(optStatePtr, llen, anchor, offBase, m=
len);
+                    ZSTD_storeSeq(seqStore, llen, anchor, iend, offBase, m=
len);
                     anchor +=3D advance;
                     ip =3D anchor;
             }   }
+            DEBUGLOG(7, "new offset history : %u, %u, %u", rep[0], rep[1],=
 rep[2]);
+
+            /* update all costs */
             ZSTD_setBasePrices(optStatePtr, optLevel);
         }
     }   /* while (ip < ilimit) */
@@ -1320,42 +1441,51 @@ ZSTD_compressBlock_opt_generic(ZSTD_matchState_t* m=
s,
     /* Return the last literals size */
     return (size_t)(iend - anchor);
 }
+#endif /* build exclusions */
=20
+#ifndef ZSTD_EXCLUDE_BTOPT_BLOCK_COMPRESSOR
 static size_t ZSTD_compressBlock_opt0(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         const void* src, size_t srcSize, const ZSTD_dictMode_e dictMode)
 {
     return ZSTD_compressBlock_opt_generic(ms, seqStore, rep, src, srcSize,=
 0 /* optLevel */, dictMode);
 }
+#endif
=20
+#ifndef ZSTD_EXCLUDE_BTULTRA_BLOCK_COMPRESSOR
 static size_t ZSTD_compressBlock_opt2(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         const void* src, size_t srcSize, const ZSTD_dictMode_e dictMode)
 {
     return ZSTD_compressBlock_opt_generic(ms, seqStore, rep, src, srcSize,=
 2 /* optLevel */, dictMode);
 }
+#endif
=20
+#ifndef ZSTD_EXCLUDE_BTOPT_BLOCK_COMPRESSOR
 size_t ZSTD_compressBlock_btopt(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         const void* src, size_t srcSize)
 {
     DEBUGLOG(5, "ZSTD_compressBlock_btopt");
     return ZSTD_compressBlock_opt0(ms, seqStore, rep, src, srcSize, ZSTD_n=
oDict);
 }
+#endif
=20
=20
=20
=20
+#ifndef ZSTD_EXCLUDE_BTULTRA_BLOCK_COMPRESSOR
 /* ZSTD_initStats_ultra():
  * make a first compression pass, just to seed stats with more accurate st=
arting values.
  * only works on first block, with no dictionary and no ldm.
- * this function cannot error, hence its contract must be respected.
+ * this function cannot error out, its narrow contract must be respected.
  */
-static void
-ZSTD_initStats_ultra(ZSTD_matchState_t* ms,
-                     seqStore_t* seqStore,
-                     U32 rep[ZSTD_REP_NUM],
-               const void* src, size_t srcSize)
+static
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
+void ZSTD_initStats_ultra(ZSTD_MatchState_t* ms,
+                          SeqStore_t* seqStore,
+                          U32 rep[ZSTD_REP_NUM],
+                    const void* src, size_t srcSize)
 {
     U32 tmpRep[ZSTD_REP_NUM];  /* updated rep codes will sink here */
     ZSTD_memcpy(tmpRep, rep, sizeof(tmpRep));
@@ -1368,7 +1498,7 @@ ZSTD_initStats_ultra(ZSTD_matchState_t* ms,
=20
     ZSTD_compressBlock_opt2(ms, seqStore, tmpRep, src, srcSize, ZSTD_noDic=
t);   /* generate stats into ms->opt*/
=20
-    /* invalidate first scan from history */
+    /* invalidate first scan from history, only keep entropy stats */
     ZSTD_resetSeqStore(seqStore);
     ms->window.base -=3D srcSize;
     ms->window.dictLimit +=3D (U32)srcSize;
@@ -1378,7 +1508,7 @@ ZSTD_initStats_ultra(ZSTD_matchState_t* ms,
 }
=20
 size_t ZSTD_compressBlock_btultra(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         const void* src, size_t srcSize)
 {
     DEBUGLOG(5, "ZSTD_compressBlock_btultra (srcSize=3D%zu)", srcSize);
@@ -1386,16 +1516,16 @@ size_t ZSTD_compressBlock_btultra(
 }
=20
 size_t ZSTD_compressBlock_btultra2(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         const void* src, size_t srcSize)
 {
     U32 const curr =3D (U32)((const BYTE*)src - ms->window.base);
     DEBUGLOG(5, "ZSTD_compressBlock_btultra2 (srcSize=3D%zu)", srcSize);
=20
-    /* 2-pass strategy:
+    /* 2-passes strategy:
      * this strategy makes a first pass over first block to collect statis=
tics
-     * and seed next round's statistics with it.
-     * After 1st pass, function forgets everything, and starts a new block.
+     * in order to seed next round's statistics with it.
+     * After 1st pass, function forgets history, and starts a new block.
      * Consequently, this can only work if no data has been previously loa=
ded in tables,
      * aka, no dictionary, no prefix, no ldm preprocessing.
      * The compression ratio gain is generally small (~0.5% on first block=
),
@@ -1404,42 +1534,47 @@ size_t ZSTD_compressBlock_btultra2(
     if ( (ms->opt.litLengthSum=3D=3D0)   /* first block */
       && (seqStore->sequences =3D=3D seqStore->sequencesStart)  /* no ldm =
*/
       && (ms->window.dictLimit =3D=3D ms->window.lowLimit)   /* no diction=
ary */
-      && (curr =3D=3D ms->window.dictLimit)   /* start of frame, nothing a=
lready loaded nor skipped */
-      && (srcSize > ZSTD_PREDEF_THRESHOLD)
+      && (curr =3D=3D ms->window.dictLimit)    /* start of frame, nothing =
already loaded nor skipped */
+      && (srcSize > ZSTD_PREDEF_THRESHOLD) /* input large enough to not em=
ploy default stats */
       ) {
         ZSTD_initStats_ultra(ms, seqStore, rep, src, srcSize);
     }
=20
     return ZSTD_compressBlock_opt2(ms, seqStore, rep, src, srcSize, ZSTD_n=
oDict);
 }
+#endif
=20
+#ifndef ZSTD_EXCLUDE_BTOPT_BLOCK_COMPRESSOR
 size_t ZSTD_compressBlock_btopt_dictMatchState(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         const void* src, size_t srcSize)
 {
     return ZSTD_compressBlock_opt0(ms, seqStore, rep, src, srcSize, ZSTD_d=
ictMatchState);
 }
=20
-size_t ZSTD_compressBlock_btultra_dictMatchState(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_btopt_extDict(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         const void* src, size_t srcSize)
 {
-    return ZSTD_compressBlock_opt2(ms, seqStore, rep, src, srcSize, ZSTD_d=
ictMatchState);
+    return ZSTD_compressBlock_opt0(ms, seqStore, rep, src, srcSize, ZSTD_e=
xtDict);
 }
+#endif
=20
-size_t ZSTD_compressBlock_btopt_extDict(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+#ifndef ZSTD_EXCLUDE_BTULTRA_BLOCK_COMPRESSOR
+size_t ZSTD_compressBlock_btultra_dictMatchState(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         const void* src, size_t srcSize)
 {
-    return ZSTD_compressBlock_opt0(ms, seqStore, rep, src, srcSize, ZSTD_e=
xtDict);
+    return ZSTD_compressBlock_opt2(ms, seqStore, rep, src, srcSize, ZSTD_d=
ictMatchState);
 }
=20
 size_t ZSTD_compressBlock_btultra_extDict(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         const void* src, size_t srcSize)
 {
     return ZSTD_compressBlock_opt2(ms, seqStore, rep, src, srcSize, ZSTD_e=
xtDict);
 }
+#endif
=20
 /* note : no btultra2 variant for extDict nor dictMatchState,
  * because btultra2 is not meant to work with dictionaries
diff --git a/lib/zstd/compress/zstd_opt.h b/lib/zstd/compress/zstd_opt.h
index 22b862858ba7..fbdc540ec9d1 100644
--- a/lib/zstd/compress/zstd_opt.h
+++ b/lib/zstd/compress/zstd_opt.h
@@ -1,5 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -11,40 +12,62 @@
 #ifndef ZSTD_OPT_H
 #define ZSTD_OPT_H
=20
-
 #include "zstd_compress_internal.h"
=20
+#if !defined(ZSTD_EXCLUDE_BTLAZY2_BLOCK_COMPRESSOR) \
+ || !defined(ZSTD_EXCLUDE_BTOPT_BLOCK_COMPRESSOR) \
+ || !defined(ZSTD_EXCLUDE_BTULTRA_BLOCK_COMPRESSOR)
 /* used in ZSTD_loadDictionaryContent() */
-void ZSTD_updateTree(ZSTD_matchState_t* ms, const BYTE* ip, const BYTE* ie=
nd);
+void ZSTD_updateTree(ZSTD_MatchState_t* ms, const BYTE* ip, const BYTE* ie=
nd);
+#endif
=20
+#ifndef ZSTD_EXCLUDE_BTOPT_BLOCK_COMPRESSOR
 size_t ZSTD_compressBlock_btopt(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
-size_t ZSTD_compressBlock_btultra(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_btopt_dictMatchState(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
-size_t ZSTD_compressBlock_btultra2(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+size_t ZSTD_compressBlock_btopt_extDict(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
=20
+#define ZSTD_COMPRESSBLOCK_BTOPT ZSTD_compressBlock_btopt
+#define ZSTD_COMPRESSBLOCK_BTOPT_DICTMATCHSTATE ZSTD_compressBlock_btopt_d=
ictMatchState
+#define ZSTD_COMPRESSBLOCK_BTOPT_EXTDICT ZSTD_compressBlock_btopt_extDict
+#else
+#define ZSTD_COMPRESSBLOCK_BTOPT NULL
+#define ZSTD_COMPRESSBLOCK_BTOPT_DICTMATCHSTATE NULL
+#define ZSTD_COMPRESSBLOCK_BTOPT_EXTDICT NULL
+#endif
=20
-size_t ZSTD_compressBlock_btopt_dictMatchState(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+#ifndef ZSTD_EXCLUDE_BTULTRA_BLOCK_COMPRESSOR
+size_t ZSTD_compressBlock_btultra(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
 size_t ZSTD_compressBlock_btultra_dictMatchState(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
-        void const* src, size_t srcSize);
-
-size_t ZSTD_compressBlock_btopt_extDict(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
 size_t ZSTD_compressBlock_btultra_extDict(
-        ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
         void const* src, size_t srcSize);
=20
         /* note : no btultra2 variant for extDict nor dictMatchState,
          * because btultra2 is not meant to work with dictionaries
          * and is only specific for the first block (no prefix) */
+size_t ZSTD_compressBlock_btultra2(
+        ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM],
+        void const* src, size_t srcSize);
=20
+#define ZSTD_COMPRESSBLOCK_BTULTRA ZSTD_compressBlock_btultra
+#define ZSTD_COMPRESSBLOCK_BTULTRA_DICTMATCHSTATE ZSTD_compressBlock_btult=
ra_dictMatchState
+#define ZSTD_COMPRESSBLOCK_BTULTRA_EXTDICT ZSTD_compressBlock_btultra_extD=
ict
+#define ZSTD_COMPRESSBLOCK_BTULTRA2 ZSTD_compressBlock_btultra2
+#else
+#define ZSTD_COMPRESSBLOCK_BTULTRA NULL
+#define ZSTD_COMPRESSBLOCK_BTULTRA_DICTMATCHSTATE NULL
+#define ZSTD_COMPRESSBLOCK_BTULTRA_EXTDICT NULL
+#define ZSTD_COMPRESSBLOCK_BTULTRA2 NULL
+#endif
=20
 #endif /* ZSTD_OPT_H */
diff --git a/lib/zstd/compress/zstd_preSplit.c b/lib/zstd/compress/zstd_pre=
Split.c
new file mode 100644
index 000000000000..7d9403c9a3bc
--- /dev/null
+++ b/lib/zstd/compress/zstd_preSplit.c
@@ -0,0 +1,239 @@
+// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
+/*
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
+ * All rights reserved.
+ *
+ * This source code is licensed under both the BSD-style license (found in=
 the
+ * LICENSE file in the root directory of this source tree) and the GPLv2 (=
found
+ * in the COPYING file in the root directory of this source tree).
+ * You may select, at your option, one of the above-listed licenses.
+ */
+
+#include "../common/compiler.h" /* ZSTD_ALIGNOF */
+#include "../common/mem.h" /* S64 */
+#include "../common/zstd_deps.h" /* ZSTD_memset */
+#include "../common/zstd_internal.h" /* ZSTD_STATIC_ASSERT */
+#include "hist.h" /* HIST_add */
+#include "zstd_preSplit.h"
+
+
+#define BLOCKSIZE_MIN 3500
+#define THRESHOLD_PENALTY_RATE 16
+#define THRESHOLD_BASE (THRESHOLD_PENALTY_RATE - 2)
+#define THRESHOLD_PENALTY 3
+
+#define HASHLENGTH 2
+#define HASHLOG_MAX 10
+#define HASHTABLESIZE (1 << HASHLOG_MAX)
+#define HASHMASK (HASHTABLESIZE - 1)
+#define KNUTH 0x9e3779b9
+
+/* for hashLog > 8, hash 2 bytes.
+ * for hashLog =3D=3D 8, just take the byte, no hashing.
+ * The speed of this method relies on compile-time constant propagation */
+FORCE_INLINE_TEMPLATE unsigned hash2(const void *p, unsigned hashLog)
+{
+    assert(hashLog >=3D 8);
+    if (hashLog =3D=3D 8) return (U32)((const BYTE*)p)[0];
+    assert(hashLog <=3D HASHLOG_MAX);
+    return (U32)(MEM_read16(p)) * KNUTH >> (32 - hashLog);
+}
+
+
+typedef struct {
+  unsigned events[HASHTABLESIZE];
+  size_t nbEvents;
+} Fingerprint;
+typedef struct {
+    Fingerprint pastEvents;
+    Fingerprint newEvents;
+} FPStats;
+
+static void initStats(FPStats* fpstats)
+{
+    ZSTD_memset(fpstats, 0, sizeof(FPStats));
+}
+
+FORCE_INLINE_TEMPLATE void
+addEvents_generic(Fingerprint* fp, const void* src, size_t srcSize, size_t=
 samplingRate, unsigned hashLog)
+{
+    const char* p =3D (const char*)src;
+    size_t limit =3D srcSize - HASHLENGTH + 1;
+    size_t n;
+    assert(srcSize >=3D HASHLENGTH);
+    for (n =3D 0; n < limit; n+=3DsamplingRate) {
+        fp->events[hash2(p+n, hashLog)]++;
+    }
+    fp->nbEvents +=3D limit/samplingRate;
+}
+
+FORCE_INLINE_TEMPLATE void
+recordFingerprint_generic(Fingerprint* fp, const void* src, size_t srcSize=
, size_t samplingRate, unsigned hashLog)
+{
+    ZSTD_memset(fp, 0, sizeof(unsigned) * ((size_t)1 << hashLog));
+    fp->nbEvents =3D 0;
+    addEvents_generic(fp, src, srcSize, samplingRate, hashLog);
+}
+
+typedef void (*RecordEvents_f)(Fingerprint* fp, const void* src, size_t sr=
cSize);
+
+#define FP_RECORD(_rate) ZSTD_recordFingerprint_##_rate
+
+#define ZSTD_GEN_RECORD_FINGERPRINT(_rate, _hSize)                        =
         \
+    static void FP_RECORD(_rate)(Fingerprint* fp, const void* src, size_t =
srcSize) \
+    {                                                                     =
         \
+        recordFingerprint_generic(fp, src, srcSize, _rate, _hSize);       =
         \
+    }
+
+ZSTD_GEN_RECORD_FINGERPRINT(1, 10)
+ZSTD_GEN_RECORD_FINGERPRINT(5, 10)
+ZSTD_GEN_RECORD_FINGERPRINT(11, 9)
+ZSTD_GEN_RECORD_FINGERPRINT(43, 8)
+
+
+static U64 abs64(S64 s64) { return (U64)((s64 < 0) ? -s64 : s64); }
+
+static U64 fpDistance(const Fingerprint* fp1, const Fingerprint* fp2, unsi=
gned hashLog)
+{
+    U64 distance =3D 0;
+    size_t n;
+    assert(hashLog <=3D HASHLOG_MAX);
+    for (n =3D 0; n < ((size_t)1 << hashLog); n++) {
+        distance +=3D
+            abs64((S64)fp1->events[n] * (S64)fp2->nbEvents - (S64)fp2->eve=
nts[n] * (S64)fp1->nbEvents);
+    }
+    return distance;
+}
+
+/* Compare newEvents with pastEvents
+ * return 1 when considered "too different"
+ */
+static int compareFingerprints(const Fingerprint* ref,
+                            const Fingerprint* newfp,
+                            int penalty,
+                            unsigned hashLog)
+{
+    assert(ref->nbEvents > 0);
+    assert(newfp->nbEvents > 0);
+    {   U64 p50 =3D (U64)ref->nbEvents * (U64)newfp->nbEvents;
+        U64 deviation =3D fpDistance(ref, newfp, hashLog);
+        U64 threshold =3D p50 * (U64)(THRESHOLD_BASE + penalty) / THRESHOL=
D_PENALTY_RATE;
+        return deviation >=3D threshold;
+    }
+}
+
+static void mergeEvents(Fingerprint* acc, const Fingerprint* newfp)
+{
+    size_t n;
+    for (n =3D 0; n < HASHTABLESIZE; n++) {
+        acc->events[n] +=3D newfp->events[n];
+    }
+    acc->nbEvents +=3D newfp->nbEvents;
+}
+
+static void flushEvents(FPStats* fpstats)
+{
+    size_t n;
+    for (n =3D 0; n < HASHTABLESIZE; n++) {
+        fpstats->pastEvents.events[n] =3D fpstats->newEvents.events[n];
+    }
+    fpstats->pastEvents.nbEvents =3D fpstats->newEvents.nbEvents;
+    ZSTD_memset(&fpstats->newEvents, 0, sizeof(fpstats->newEvents));
+}
+
+static void removeEvents(Fingerprint* acc, const Fingerprint* slice)
+{
+    size_t n;
+    for (n =3D 0; n < HASHTABLESIZE; n++) {
+        assert(acc->events[n] >=3D slice->events[n]);
+        acc->events[n] -=3D slice->events[n];
+    }
+    acc->nbEvents -=3D slice->nbEvents;
+}
+
+#define CHUNKSIZE (8 << 10)
+static size_t ZSTD_splitBlock_byChunks(const void* blockStart, size_t bloc=
kSize,
+                        int level,
+                        void* workspace, size_t wkspSize)
+{
+    static const RecordEvents_f records_fs[] =3D {
+        FP_RECORD(43), FP_RECORD(11), FP_RECORD(5), FP_RECORD(1)
+    };
+    static const unsigned hashParams[] =3D { 8, 9, 10, 10 };
+    const RecordEvents_f record_f =3D (assert(0<=3Dlevel && level<=3D3), r=
ecords_fs[level]);
+    FPStats* const fpstats =3D (FPStats*)workspace;
+    const char* p =3D (const char*)blockStart;
+    int penalty =3D THRESHOLD_PENALTY;
+    size_t pos =3D 0;
+    assert(blockSize =3D=3D (128 << 10));
+    assert(workspace !=3D NULL);
+    assert((size_t)workspace % ZSTD_ALIGNOF(FPStats) =3D=3D 0);
+    ZSTD_STATIC_ASSERT(ZSTD_SLIPBLOCK_WORKSPACESIZE >=3D sizeof(FPStats));
+    assert(wkspSize >=3D sizeof(FPStats)); (void)wkspSize;
+
+    initStats(fpstats);
+    record_f(&fpstats->pastEvents, p, CHUNKSIZE);
+    for (pos =3D CHUNKSIZE; pos <=3D blockSize - CHUNKSIZE; pos +=3D CHUNK=
SIZE) {
+        record_f(&fpstats->newEvents, p + pos, CHUNKSIZE);
+        if (compareFingerprints(&fpstats->pastEvents, &fpstats->newEvents,=
 penalty, hashParams[level])) {
+            return pos;
+        } else {
+            mergeEvents(&fpstats->pastEvents, &fpstats->newEvents);
+            if (penalty > 0) penalty--;
+        }
+    }
+    assert(pos =3D=3D blockSize);
+    return blockSize;
+    (void)flushEvents; (void)removeEvents;
+}
+
+/* ZSTD_splitBlock_fromBorders(): very fast strategy :
+ * compare fingerprint from beginning and end of the block,
+ * derive from their difference if it's preferable to split in the middle,
+ * repeat the process a second time, for finer grained decision.
+ * 3 times did not brought improvements, so I stopped at 2.
+ * Benefits are good enough for a cheap heuristic.
+ * More accurate splitting saves more, but speed impact is also more perce=
ptible.
+ * For better accuracy, use more elaborate variant *_byChunks.
+ */
+static size_t ZSTD_splitBlock_fromBorders(const void* blockStart, size_t b=
lockSize,
+                        void* workspace, size_t wkspSize)
+{
+#define SEGMENT_SIZE 512
+    FPStats* const fpstats =3D (FPStats*)workspace;
+    Fingerprint* middleEvents =3D (Fingerprint*)(void*)((char*)workspace +=
 512 * sizeof(unsigned));
+    assert(blockSize =3D=3D (128 << 10));
+    assert(workspace !=3D NULL);
+    assert((size_t)workspace % ZSTD_ALIGNOF(FPStats) =3D=3D 0);
+    ZSTD_STATIC_ASSERT(ZSTD_SLIPBLOCK_WORKSPACESIZE >=3D sizeof(FPStats));
+    assert(wkspSize >=3D sizeof(FPStats)); (void)wkspSize;
+
+    initStats(fpstats);
+    HIST_add(fpstats->pastEvents.events, blockStart, SEGMENT_SIZE);
+    HIST_add(fpstats->newEvents.events, (const char*)blockStart + blockSiz=
e - SEGMENT_SIZE, SEGMENT_SIZE);
+    fpstats->pastEvents.nbEvents =3D fpstats->newEvents.nbEvents =3D SEGME=
NT_SIZE;
+    if (!compareFingerprints(&fpstats->pastEvents, &fpstats->newEvents, 0,=
 8))
+        return blockSize;
+
+    HIST_add(middleEvents->events, (const char*)blockStart + blockSize/2 -=
 SEGMENT_SIZE/2, SEGMENT_SIZE);
+    middleEvents->nbEvents =3D SEGMENT_SIZE;
+    {   U64 const distFromBegin =3D fpDistance(&fpstats->pastEvents, middl=
eEvents, 8);
+        U64 const distFromEnd =3D fpDistance(&fpstats->newEvents, middleEv=
ents, 8);
+        U64 const minDistance =3D SEGMENT_SIZE * SEGMENT_SIZE / 3;
+        if (abs64((S64)distFromBegin - (S64)distFromEnd) < minDistance)
+            return 64 KB;
+        return (distFromBegin > distFromEnd) ? 32 KB : 96 KB;
+    }
+}
+
+size_t ZSTD_splitBlock(const void* blockStart, size_t blockSize,
+                    int level,
+                    void* workspace, size_t wkspSize)
+{
+    DEBUGLOG(6, "ZSTD_splitBlock (level=3D%i)", level);
+    assert(0<=3Dlevel && level<=3D4);
+    if (level =3D=3D 0)
+        return ZSTD_splitBlock_fromBorders(blockStart, blockSize, workspac=
e, wkspSize);
+    /* level >=3D 1*/
+    return ZSTD_splitBlock_byChunks(blockStart, blockSize, level-1, worksp=
ace, wkspSize);
+}
diff --git a/lib/zstd/compress/zstd_preSplit.h b/lib/zstd/compress/zstd_pre=
Split.h
new file mode 100644
index 000000000000..f98f797fe191
--- /dev/null
+++ b/lib/zstd/compress/zstd_preSplit.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
+/*
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
+ * All rights reserved.
+ *
+ * This source code is licensed under both the BSD-style license (found in=
 the
+ * LICENSE file in the root directory of this source tree) and the GPLv2 (=
found
+ * in the COPYING file in the root directory of this source tree).
+ * You may select, at your option, one of the above-listed licenses.
+ */
+
+#ifndef ZSTD_PRESPLIT_H
+#define ZSTD_PRESPLIT_H
+
+#include <linux/types.h>  /* size_t */
+
+#define ZSTD_SLIPBLOCK_WORKSPACESIZE 8208
+
+/* ZSTD_splitBlock():
+ * @level must be a value between 0 and 4.
+ *        higher levels spend more energy to detect block boundaries.
+ * @workspace must be aligned for size_t.
+ * @wkspSize must be at least >=3D ZSTD_SLIPBLOCK_WORKSPACESIZE
+ * note:
+ * For the time being, this function only accepts full 128 KB blocks.
+ * Therefore, @blockSize must be =3D=3D 128 KB.
+ * While this could be extended to smaller sizes in the future,
+ * it is not yet clear if this would be useful. TBD.
+ */
+size_t ZSTD_splitBlock(const void* blockStart, size_t blockSize,
+                    int level,
+                    void* workspace, size_t wkspSize);
+
+#endif /* ZSTD_PRESPLIT_H */
diff --git a/lib/zstd/decompress/huf_decompress.c b/lib/zstd/decompress/huf=
_decompress.c
index 60958afebc41..ac8b87f48f84 100644
--- a/lib/zstd/decompress/huf_decompress.c
+++ b/lib/zstd/decompress/huf_decompress.c
@@ -1,7 +1,8 @@
+// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
 /* ******************************************************************
  * huff0 huffman decoder,
  * part of Finite State Entropy library
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  *
  *  You can contact the author at :
  *  - FSE+HUF source repository : https://github.com/Cyan4973/FiniteStateE=
ntropy
@@ -19,10 +20,10 @@
 #include "../common/compiler.h"
 #include "../common/bitstream.h"  /* BIT_* */
 #include "../common/fse.h"        /* to compress headers */
-#define HUF_STATIC_LINKING_ONLY
 #include "../common/huf.h"
 #include "../common/error_private.h"
 #include "../common/zstd_internal.h"
+#include "../common/bits.h"       /* ZSTD_highbit32, ZSTD_countTrailingZer=
os64 */
=20
 /* **************************************************************
 *  Constants
@@ -34,6 +35,12 @@
 *  Macros
 ****************************************************************/
=20
+#ifdef HUF_DISABLE_FAST_DECODE
+# define HUF_ENABLE_FAST_DECODE 0
+#else
+# define HUF_ENABLE_FAST_DECODE 1
+#endif
+
 /* These two optional macros force the use one way or another of the two
  * Huffman decompression implementations. You can't force in both directio=
ns
  * at the same time.
@@ -43,27 +50,25 @@
 #error "Cannot force the use of the X1 and X2 decoders at the same time!"
 #endif
=20
-#if ZSTD_ENABLE_ASM_X86_64_BMI2 && DYNAMIC_BMI2
-# define HUF_ASM_X86_64_BMI2_ATTRS BMI2_TARGET_ATTRIBUTE
+/* When DYNAMIC_BMI2 is enabled, fast decoders are only called when bmi2 is
+ * supported at runtime, so we can add the BMI2 target attribute.
+ * When it is disabled, we will still get BMI2 if it is enabled statically.
+ */
+#if DYNAMIC_BMI2
+# define HUF_FAST_BMI2_ATTRS BMI2_TARGET_ATTRIBUTE
 #else
-# define HUF_ASM_X86_64_BMI2_ATTRS
+# define HUF_FAST_BMI2_ATTRS
 #endif
=20
 #define HUF_EXTERN_C
 #define HUF_ASM_DECL HUF_EXTERN_C
=20
-#if DYNAMIC_BMI2 || (ZSTD_ENABLE_ASM_X86_64_BMI2 && defined(__BMI2__))
+#if DYNAMIC_BMI2
 # define HUF_NEED_BMI2_FUNCTION 1
 #else
 # define HUF_NEED_BMI2_FUNCTION 0
 #endif
=20
-#if !(ZSTD_ENABLE_ASM_X86_64_BMI2 && defined(__BMI2__))
-# define HUF_NEED_DEFAULT_FUNCTION 1
-#else
-# define HUF_NEED_DEFAULT_FUNCTION 0
-#endif
-
 /* **************************************************************
 *  Error Management
 ****************************************************************/
@@ -80,6 +85,11 @@
 /* **************************************************************
 *  BMI2 Variant Wrappers
 ****************************************************************/
+typedef size_t (*HUF_DecompressUsingDTableFn)(void *dst, size_t dstSize,
+                                              const void *cSrc,
+                                              size_t cSrcSize,
+                                              const HUF_DTable *DTable);
+
 #if DYNAMIC_BMI2
=20
 #define HUF_DGEN(fn)                                                      =
  \
@@ -101,9 +111,9 @@
     }                                                                     =
  \
                                                                           =
  \
     static size_t fn(void* dst, size_t dstSize, void const* cSrc,         =
  \
-                     size_t cSrcSize, HUF_DTable const* DTable, int bmi2) =
  \
+                     size_t cSrcSize, HUF_DTable const* DTable, int flags)=
  \
     {                                                                     =
  \
-        if (bmi2) {                                                       =
  \
+        if (flags & HUF_flags_bmi2) {                                     =
  \
             return fn##_bmi2(dst, dstSize, cSrc, cSrcSize, DTable);       =
  \
         }                                                                 =
  \
         return fn##_default(dst, dstSize, cSrc, cSrcSize, DTable);        =
  \
@@ -113,9 +123,9 @@
=20
 #define HUF_DGEN(fn)                                                      =
  \
     static size_t fn(void* dst, size_t dstSize, void const* cSrc,         =
  \
-                     size_t cSrcSize, HUF_DTable const* DTable, int bmi2) =
  \
+                     size_t cSrcSize, HUF_DTable const* DTable, int flags)=
  \
     {                                                                     =
  \
-        (void)bmi2;                                                       =
  \
+        (void)flags;                                                      =
  \
         return fn##_body(dst, dstSize, cSrc, cSrcSize, DTable);           =
  \
     }
=20
@@ -134,43 +144,66 @@ static DTableDesc HUF_getDTableDesc(const HUF_DTable*=
 table)
     return dtd;
 }
=20
-#if ZSTD_ENABLE_ASM_X86_64_BMI2
-
-static size_t HUF_initDStream(BYTE const* ip) {
+static size_t HUF_initFastDStream(BYTE const* ip) {
     BYTE const lastByte =3D ip[7];
-    size_t const bitsConsumed =3D lastByte ? 8 - BIT_highbit32(lastByte) :=
 0;
+    size_t const bitsConsumed =3D lastByte ? 8 - ZSTD_highbit32(lastByte) =
: 0;
     size_t const value =3D MEM_readLEST(ip) | 1;
     assert(bitsConsumed <=3D 8);
+    assert(sizeof(size_t) =3D=3D 8);
     return value << bitsConsumed;
 }
+
+
+/*
+ * The input/output arguments to the Huffman fast decoding loop:
+ *
+ * ip [in/out] - The input pointers, must be updated to reflect what is co=
nsumed.
+ * op [in/out] - The output pointers, must be updated to reflect what is w=
ritten.
+ * bits [in/out] - The bitstream containers, must be updated to reflect th=
e current state.
+ * dt [in] - The decoding table.
+ * ilowest [in] - The beginning of the valid range of the input. Decoders =
may read
+ *                down to this pointer. It may be below iend[0].
+ * oend [in] - The end of the output stream. op[3] must not cross oend.
+ * iend [in] - The end of each input stream. ip[i] may cross iend[i],
+ *             as long as it is above ilowest, but that indicates corrupti=
on.
+ */
 typedef struct {
     BYTE const* ip[4];
     BYTE* op[4];
     U64 bits[4];
     void const* dt;
-    BYTE const* ilimit;
+    BYTE const* ilowest;
     BYTE* oend;
     BYTE const* iend[4];
-} HUF_DecompressAsmArgs;
+} HUF_DecompressFastArgs;
+
+typedef void (*HUF_DecompressFastLoopFn)(HUF_DecompressFastArgs*);
=20
 /*
- * Initializes args for the asm decoding loop.
- * @returns 0 on success
- *          1 if the fallback implementation should be used.
+ * Initializes args for the fast decoding loop.
+ * @returns 1 on success
+ *          0 if the fallback implementation should be used.
  *          Or an error code on failure.
  */
-static size_t HUF_DecompressAsmArgs_init(HUF_DecompressAsmArgs* args, void=
* dst, size_t dstSize, void const* src, size_t srcSize, const HUF_DTable* D=
Table)
+static size_t HUF_DecompressFastArgs_init(HUF_DecompressFastArgs* args, vo=
id* dst, size_t dstSize, void const* src, size_t srcSize, const HUF_DTable*=
 DTable)
 {
     void const* dt =3D DTable + 1;
     U32 const dtLog =3D HUF_getDTableDesc(DTable).tableLog;
=20
-    const BYTE* const ilimit =3D (const BYTE*)src + 6 + 8;
+    const BYTE* const istart =3D (const BYTE*)src;
=20
-    BYTE* const oend =3D (BYTE*)dst + dstSize;
+    BYTE* const oend =3D ZSTD_maybeNullPtrAdd((BYTE*)dst, dstSize);
=20
-    /* The following condition is false on x32 platform,
-     * but HUF_asm is not compatible with this ABI */
-    if (!(MEM_isLittleEndian() && !MEM_32bits())) return 1;
+    /* The fast decoding loop assumes 64-bit little-endian.
+     * This condition is false on x32.
+     */
+    if (!MEM_isLittleEndian() || MEM_32bits())
+        return 0;
+
+    /* Avoid nullptr addition */
+    if (dstSize =3D=3D 0)
+        return 0;
+    assert(dst !=3D NULL);
=20
     /* strict minimum : jump table + 1 byte per stream */
     if (srcSize < 10)
@@ -181,11 +214,10 @@ static size_t HUF_DecompressAsmArgs_init(HUF_Decompre=
ssAsmArgs* args, void* dst,
      * On small inputs we don't have enough data to trigger the fast loop,=
 so use the old decoder.
      */
     if (dtLog !=3D HUF_DECODER_FAST_TABLELOG)
-        return 1;
+        return 0;
=20
     /* Read the jump table. */
     {
-        const BYTE* const istart =3D (const BYTE*)src;
         size_t const length1 =3D MEM_readLE16(istart);
         size_t const length2 =3D MEM_readLE16(istart+2);
         size_t const length3 =3D MEM_readLE16(istart+4);
@@ -195,13 +227,11 @@ static size_t HUF_DecompressAsmArgs_init(HUF_Decompre=
ssAsmArgs* args, void* dst,
         args->iend[2] =3D args->iend[1] + length2;
         args->iend[3] =3D args->iend[2] + length3;
=20
-        /* HUF_initDStream() requires this, and this small of an input
+        /* HUF_initFastDStream() requires this, and this small of an input
          * won't benefit from the ASM loop anyways.
-         * length1 must be >=3D 16 so that ip[0] >=3D ilimit before the lo=
op
-         * starts.
          */
-        if (length1 < 16 || length2 < 8 || length3 < 8 || length4 < 8)
-            return 1;
+        if (length1 < 8 || length2 < 8 || length3 < 8 || length4 < 8)
+            return 0;
         if (length4 > srcSize) return ERROR(corruption_detected);   /* ove=
rflow */
     }
     /* ip[] contains the position that is currently loaded into bits[]. */
@@ -218,7 +248,7 @@ static size_t HUF_DecompressAsmArgs_init(HUF_Decompress=
AsmArgs* args, void* dst,
=20
     /* No point to call the ASM loop for tiny outputs. */
     if (args->op[3] >=3D oend)
-        return 1;
+        return 0;
=20
     /* bits[] is the bit container.
         * It is read from the MSB down to the LSB.
@@ -227,24 +257,25 @@ static size_t HUF_DecompressAsmArgs_init(HUF_Decompre=
ssAsmArgs* args, void* dst,
         * set, so that CountTrailingZeros(bits[]) can be used
         * to count how many bits we've consumed.
         */
-    args->bits[0] =3D HUF_initDStream(args->ip[0]);
-    args->bits[1] =3D HUF_initDStream(args->ip[1]);
-    args->bits[2] =3D HUF_initDStream(args->ip[2]);
-    args->bits[3] =3D HUF_initDStream(args->ip[3]);
-
-    /* If ip[] >=3D ilimit, it is guaranteed to be safe to
-        * reload bits[]. It may be beyond its section, but is
-        * guaranteed to be valid (>=3D istart).
-        */
-    args->ilimit =3D ilimit;
+    args->bits[0] =3D HUF_initFastDStream(args->ip[0]);
+    args->bits[1] =3D HUF_initFastDStream(args->ip[1]);
+    args->bits[2] =3D HUF_initFastDStream(args->ip[2]);
+    args->bits[3] =3D HUF_initFastDStream(args->ip[3]);
+
+    /* The decoders must be sure to never read beyond ilowest.
+     * This is lower than iend[0], but allowing decoders to read
+     * down to ilowest can allow an extra iteration or two in the
+     * fast loop.
+     */
+    args->ilowest =3D istart;
=20
     args->oend =3D oend;
     args->dt =3D dt;
=20
-    return 0;
+    return 1;
 }
=20
-static size_t HUF_initRemainingDStream(BIT_DStream_t* bit, HUF_DecompressA=
smArgs const* args, int stream, BYTE* segmentEnd)
+static size_t HUF_initRemainingDStream(BIT_DStream_t* bit, HUF_DecompressF=
astArgs const* args, int stream, BYTE* segmentEnd)
 {
     /* Validate that we haven't overwritten. */
     if (args->op[stream] > segmentEnd)
@@ -258,15 +289,33 @@ static size_t HUF_initRemainingDStream(BIT_DStream_t*=
 bit, HUF_DecompressAsmArgs
         return ERROR(corruption_detected);
=20
     /* Construct the BIT_DStream_t. */
-    bit->bitContainer =3D MEM_readLE64(args->ip[stream]);
-    bit->bitsConsumed =3D ZSTD_countTrailingZeros((size_t)args->bits[strea=
m]);
-    bit->start =3D (const char*)args->iend[0];
+    assert(sizeof(size_t) =3D=3D 8);
+    bit->bitContainer =3D MEM_readLEST(args->ip[stream]);
+    bit->bitsConsumed =3D ZSTD_countTrailingZeros64(args->bits[stream]);
+    bit->start =3D (const char*)args->ilowest;
     bit->limitPtr =3D bit->start + sizeof(size_t);
     bit->ptr =3D (const char*)args->ip[stream];
=20
     return 0;
 }
-#endif
+
+/* Calls X(N) for each stream 0, 1, 2, 3. */
+#define HUF_4X_FOR_EACH_STREAM(X) \
+    do {                          \
+        X(0);                     \
+        X(1);                     \
+        X(2);                     \
+        X(3);                     \
+    } while (0)
+
+/* Calls X(N, var) for each stream 0, 1, 2, 3. */
+#define HUF_4X_FOR_EACH_STREAM_WITH_VAR(X, var) \
+    do {                                        \
+        X(0, (var));                            \
+        X(1, (var));                            \
+        X(2, (var));                            \
+        X(3, (var));                            \
+    } while (0)
=20
=20
 #ifndef HUF_FORCE_DECOMPRESS_X2
@@ -283,10 +332,11 @@ typedef struct { BYTE nbBits; BYTE byte; } HUF_DEltX1=
;   /* single-symbol decodi
 static U64 HUF_DEltX1_set4(BYTE symbol, BYTE nbBits) {
     U64 D4;
     if (MEM_isLittleEndian()) {
-        D4 =3D (symbol << 8) + nbBits;
+        D4 =3D (U64)((symbol << 8) + nbBits);
     } else {
-        D4 =3D symbol + (nbBits << 8);
+        D4 =3D (U64)(symbol + (nbBits << 8));
     }
+    assert(D4 < (1U << 16));
     D4 *=3D 0x0001000100010001ULL;
     return D4;
 }
@@ -329,13 +379,7 @@ typedef struct {
         BYTE huffWeight[HUF_SYMBOLVALUE_MAX + 1];
 } HUF_ReadDTableX1_Workspace;
=20
-
-size_t HUF_readDTableX1_wksp(HUF_DTable* DTable, const void* src, size_t s=
rcSize, void* workSpace, size_t wkspSize)
-{
-    return HUF_readDTableX1_wksp_bmi2(DTable, src, srcSize, workSpace, wks=
pSize, /* bmi2 */ 0);
-}
-
-size_t HUF_readDTableX1_wksp_bmi2(HUF_DTable* DTable, const void* src, siz=
e_t srcSize, void* workSpace, size_t wkspSize, int bmi2)
+size_t HUF_readDTableX1_wksp(HUF_DTable* DTable, const void* src, size_t s=
rcSize, void* workSpace, size_t wkspSize, int flags)
 {
     U32 tableLog =3D 0;
     U32 nbSymbols =3D 0;
@@ -350,7 +394,7 @@ size_t HUF_readDTableX1_wksp_bmi2(HUF_DTable* DTable, c=
onst void* src, size_t sr
     DEBUG_STATIC_ASSERT(sizeof(DTableDesc) =3D=3D sizeof(HUF_DTable));
     /* ZSTD_memset(huffWeight, 0, sizeof(huffWeight)); */   /* is not nece=
ssary, even though some analyzer complain ... */
=20
-    iSize =3D HUF_readStats_wksp(wksp->huffWeight, HUF_SYMBOLVALUE_MAX + 1=
, wksp->rankVal, &nbSymbols, &tableLog, src, srcSize, wksp->statsWksp, size=
of(wksp->statsWksp), bmi2);
+    iSize =3D HUF_readStats_wksp(wksp->huffWeight, HUF_SYMBOLVALUE_MAX + 1=
, wksp->rankVal, &nbSymbols, &tableLog, src, srcSize, wksp->statsWksp, size=
of(wksp->statsWksp), flags);
     if (HUF_isError(iSize)) return iSize;
=20
=20
@@ -377,9 +421,8 @@ size_t HUF_readDTableX1_wksp_bmi2(HUF_DTable* DTable, c=
onst void* src, size_t sr
      * rankStart[0] is not filled because there are no entries in the tabl=
e for
      * weight 0.
      */
-    {
-        int n;
-        int nextRankStart =3D 0;
+    {   int n;
+        U32 nextRankStart =3D 0;
         int const unroll =3D 4;
         int const nLimit =3D (int)nbSymbols - unroll + 1;
         for (n=3D0; n<(int)tableLog+1; n++) {
@@ -406,10 +449,9 @@ size_t HUF_readDTableX1_wksp_bmi2(HUF_DTable* DTable, =
const void* src, size_t sr
      * We can switch based on the length to a different inner loop which is
      * optimized for that particular case.
      */
-    {
-        U32 w;
-        int symbol=3Dwksp->rankVal[0];
-        int rankStart=3D0;
+    {   U32 w;
+        int symbol =3D wksp->rankVal[0];
+        int rankStart =3D 0;
         for (w=3D1; w<tableLog+1; ++w) {
             int const symbolCount =3D wksp->rankVal[w];
             int const length =3D (1 << w) >> 1;
@@ -483,15 +525,19 @@ HUF_decodeSymbolX1(BIT_DStream_t* Dstream, const HUF_=
DEltX1* dt, const U32 dtLog
 }
=20
 #define HUF_DECODE_SYMBOLX1_0(ptr, DStreamPtr) \
-    *ptr++ =3D HUF_decodeSymbolX1(DStreamPtr, dt, dtLog)
+    do { *ptr++ =3D HUF_decodeSymbolX1(DStreamPtr, dt, dtLog); } while (0)
=20
-#define HUF_DECODE_SYMBOLX1_1(ptr, DStreamPtr)  \
-    if (MEM_64bits() || (HUF_TABLELOG_MAX<=3D12)) \
-        HUF_DECODE_SYMBOLX1_0(ptr, DStreamPtr)
+#define HUF_DECODE_SYMBOLX1_1(ptr, DStreamPtr)      \
+    do {                                            \
+        if (MEM_64bits() || (HUF_TABLELOG_MAX<=3D12)) \
+            HUF_DECODE_SYMBOLX1_0(ptr, DStreamPtr); \
+    } while (0)
=20
-#define HUF_DECODE_SYMBOLX1_2(ptr, DStreamPtr) \
-    if (MEM_64bits()) \
-        HUF_DECODE_SYMBOLX1_0(ptr, DStreamPtr)
+#define HUF_DECODE_SYMBOLX1_2(ptr, DStreamPtr)      \
+    do {                                            \
+        if (MEM_64bits())                           \
+            HUF_DECODE_SYMBOLX1_0(ptr, DStreamPtr); \
+    } while (0)
=20
 HINT_INLINE size_t
 HUF_decodeStreamX1(BYTE* p, BIT_DStream_t* const bitDPtr, BYTE* const pEnd=
, const HUF_DEltX1* const dt, const U32 dtLog)
@@ -519,7 +565,7 @@ HUF_decodeStreamX1(BYTE* p, BIT_DStream_t* const bitDPt=
r, BYTE* const pEnd, cons
     while (p < pEnd)
         HUF_DECODE_SYMBOLX1_0(p, bitDPtr);
=20
-    return pEnd-pStart;
+    return (size_t)(pEnd-pStart);
 }
=20
 FORCE_INLINE_TEMPLATE size_t
@@ -529,7 +575,7 @@ HUF_decompress1X1_usingDTable_internal_body(
     const HUF_DTable* DTable)
 {
     BYTE* op =3D (BYTE*)dst;
-    BYTE* const oend =3D op + dstSize;
+    BYTE* const oend =3D ZSTD_maybeNullPtrAdd(op, dstSize);
     const void* dtPtr =3D DTable + 1;
     const HUF_DEltX1* const dt =3D (const HUF_DEltX1*)dtPtr;
     BIT_DStream_t bitD;
@@ -545,6 +591,10 @@ HUF_decompress1X1_usingDTable_internal_body(
     return dstSize;
 }
=20
+/* HUF_decompress4X1_usingDTable_internal_body():
+ * Conditions :
+ * @dstSize >=3D 6
+ */
 FORCE_INLINE_TEMPLATE size_t
 HUF_decompress4X1_usingDTable_internal_body(
           void* dst,  size_t dstSize,
@@ -553,6 +603,7 @@ HUF_decompress4X1_usingDTable_internal_body(
 {
     /* Check */
     if (cSrcSize < 10) return ERROR(corruption_detected);  /* strict minim=
um : jump table + 1 byte per stream */
+    if (dstSize < 6) return ERROR(corruption_detected);         /* stream =
4-split doesn't work */
=20
     {   const BYTE* const istart =3D (const BYTE*) cSrc;
         BYTE* const ostart =3D (BYTE*) dst;
@@ -588,6 +639,7 @@ HUF_decompress4X1_usingDTable_internal_body(
=20
         if (length4 > cSrcSize) return ERROR(corruption_detected);   /* ov=
erflow */
         if (opStart4 > oend) return ERROR(corruption_detected);      /* ov=
erflow */
+        assert(dstSize >=3D 6); /* validated above */
         CHECK_F( BIT_initDStream(&bitD1, istart1, length1) );
         CHECK_F( BIT_initDStream(&bitD2, istart2, length2) );
         CHECK_F( BIT_initDStream(&bitD3, istart3, length3) );
@@ -650,52 +702,173 @@ size_t HUF_decompress4X1_usingDTable_internal_bmi2(v=
oid* dst, size_t dstSize, vo
 }
 #endif
=20
-#if HUF_NEED_DEFAULT_FUNCTION
 static
 size_t HUF_decompress4X1_usingDTable_internal_default(void* dst, size_t ds=
tSize, void const* cSrc,
                     size_t cSrcSize, HUF_DTable const* DTable) {
     return HUF_decompress4X1_usingDTable_internal_body(dst, dstSize, cSrc,=
 cSrcSize, DTable);
 }
-#endif
=20
 #if ZSTD_ENABLE_ASM_X86_64_BMI2
=20
-HUF_ASM_DECL void HUF_decompress4X1_usingDTable_internal_bmi2_asm_loop(HUF=
_DecompressAsmArgs* args) ZSTDLIB_HIDDEN;
+HUF_ASM_DECL void HUF_decompress4X1_usingDTable_internal_fast_asm_loop(HUF=
_DecompressFastArgs* args) ZSTDLIB_HIDDEN;
+
+#endif
+
+static HUF_FAST_BMI2_ATTRS
+void HUF_decompress4X1_usingDTable_internal_fast_c_loop(HUF_DecompressFast=
Args* args)
+{
+    U64 bits[4];
+    BYTE const* ip[4];
+    BYTE* op[4];
+    U16 const* const dtable =3D (U16 const*)args->dt;
+    BYTE* const oend =3D args->oend;
+    BYTE const* const ilowest =3D args->ilowest;
+
+    /* Copy the arguments to local variables */
+    ZSTD_memcpy(&bits, &args->bits, sizeof(bits));
+    ZSTD_memcpy((void*)(&ip), &args->ip, sizeof(ip));
+    ZSTD_memcpy(&op, &args->op, sizeof(op));
+
+    assert(MEM_isLittleEndian());
+    assert(!MEM_32bits());
+
+    for (;;) {
+        BYTE* olimit;
+        int stream;
+
+        /* Assert loop preconditions */
+#ifndef NDEBUG
+        for (stream =3D 0; stream < 4; ++stream) {
+            assert(op[stream] <=3D (stream =3D=3D 3 ? oend : op[stream + 1=
]));
+            assert(ip[stream] >=3D ilowest);
+        }
+#endif
+        /* Compute olimit */
+        {
+            /* Each iteration produces 5 output symbols per stream */
+            size_t const oiters =3D (size_t)(oend - op[3]) / 5;
+            /* Each iteration consumes up to 11 bits * 5 =3D 55 bits < 7 b=
ytes
+             * per stream.
+             */
+            size_t const iiters =3D (size_t)(ip[0] - ilowest) / 7;
+            /* We can safely run iters iterations before running bounds ch=
ecks */
+            size_t const iters =3D MIN(oiters, iiters);
+            size_t const symbols =3D iters * 5;
+
+            /* We can simply check that op[3] < olimit, instead of checkin=
g all
+             * of our bounds, since we can't hit the other bounds until we=
've run
+             * iters iterations, which only happens when op[3] =3D=3D olim=
it.
+             */
+            olimit =3D op[3] + symbols;
+
+            /* Exit fast decoding loop once we reach the end. */
+            if (op[3] =3D=3D olimit)
+                break;
+
+            /* Exit the decoding loop if any input pointer has crossed the
+             * previous one. This indicates corruption, and a precondition
+             * to our loop is that ip[i] >=3D ip[0].
+             */
+            for (stream =3D 1; stream < 4; ++stream) {
+                if (ip[stream] < ip[stream - 1])
+                    goto _out;
+            }
+        }
+
+#ifndef NDEBUG
+        for (stream =3D 1; stream < 4; ++stream) {
+            assert(ip[stream] >=3D ip[stream - 1]);
+        }
+#endif
+
+#define HUF_4X1_DECODE_SYMBOL(_stream, _symbol)                 \
+    do {                                                        \
+        int const index =3D (int)(bits[(_stream)] >> 53);         \
+        int const entry =3D (int)dtable[index];                   \
+        bits[(_stream)] <<=3D (entry & 0x3F);                     \
+        op[(_stream)][(_symbol)] =3D (BYTE)((entry >> 8) & 0xFF); \
+    } while (0)
+
+#define HUF_4X1_RELOAD_STREAM(_stream)                              \
+    do {                                                            \
+        int const ctz =3D ZSTD_countTrailingZeros64(bits[(_stream)]); \
+        int const nbBits =3D ctz & 7;                                 \
+        int const nbBytes =3D ctz >> 3;                               \
+        op[(_stream)] +=3D 5;                                         \
+        ip[(_stream)] -=3D nbBytes;                                   \
+        bits[(_stream)] =3D MEM_read64(ip[(_stream)]) | 1;            \
+        bits[(_stream)] <<=3D nbBits;                                 \
+    } while (0)
+
+        /* Manually unroll the loop because compilers don't consistently
+         * unroll the inner loops, which destroys performance.
+         */
+        do {
+            /* Decode 5 symbols in each of the 4 streams */
+            HUF_4X_FOR_EACH_STREAM_WITH_VAR(HUF_4X1_DECODE_SYMBOL, 0);
+            HUF_4X_FOR_EACH_STREAM_WITH_VAR(HUF_4X1_DECODE_SYMBOL, 1);
+            HUF_4X_FOR_EACH_STREAM_WITH_VAR(HUF_4X1_DECODE_SYMBOL, 2);
+            HUF_4X_FOR_EACH_STREAM_WITH_VAR(HUF_4X1_DECODE_SYMBOL, 3);
+            HUF_4X_FOR_EACH_STREAM_WITH_VAR(HUF_4X1_DECODE_SYMBOL, 4);
+
+            /* Reload each of the 4 the bitstreams */
+            HUF_4X_FOR_EACH_STREAM(HUF_4X1_RELOAD_STREAM);
+        } while (op[3] < olimit);
+
+#undef HUF_4X1_DECODE_SYMBOL
+#undef HUF_4X1_RELOAD_STREAM
+    }
=20
-static HUF_ASM_X86_64_BMI2_ATTRS
+_out:
+
+    /* Save the final values of each of the state variables back to args. =
*/
+    ZSTD_memcpy(&args->bits, &bits, sizeof(bits));
+    ZSTD_memcpy((void*)(&args->ip), &ip, sizeof(ip));
+    ZSTD_memcpy(&args->op, &op, sizeof(op));
+}
+
+/*
+ * @returns @p dstSize on success (>=3D 6)
+ *          0 if the fallback implementation should be used
+ *          An error if an error occurred
+ */
+static HUF_FAST_BMI2_ATTRS
 size_t
-HUF_decompress4X1_usingDTable_internal_bmi2_asm(
+HUF_decompress4X1_usingDTable_internal_fast(
           void* dst,  size_t dstSize,
     const void* cSrc, size_t cSrcSize,
-    const HUF_DTable* DTable)
+    const HUF_DTable* DTable,
+    HUF_DecompressFastLoopFn loopFn)
 {
     void const* dt =3D DTable + 1;
-    const BYTE* const iend =3D (const BYTE*)cSrc + 6;
-    BYTE* const oend =3D (BYTE*)dst + dstSize;
-    HUF_DecompressAsmArgs args;
-    {
-        size_t const ret =3D HUF_DecompressAsmArgs_init(&args, dst, dstSiz=
e, cSrc, cSrcSize, DTable);
-        FORWARD_IF_ERROR(ret, "Failed to init asm args");
-        if (ret !=3D 0)
-            return HUF_decompress4X1_usingDTable_internal_bmi2(dst, dstSiz=
e, cSrc, cSrcSize, DTable);
+    BYTE const* const ilowest =3D (BYTE const*)cSrc;
+    BYTE* const oend =3D ZSTD_maybeNullPtrAdd((BYTE*)dst, dstSize);
+    HUF_DecompressFastArgs args;
+    {   size_t const ret =3D HUF_DecompressFastArgs_init(&args, dst, dstSi=
ze, cSrc, cSrcSize, DTable);
+        FORWARD_IF_ERROR(ret, "Failed to init fast loop args");
+        if (ret =3D=3D 0)
+            return 0;
     }
=20
-    assert(args.ip[0] >=3D args.ilimit);
-    HUF_decompress4X1_usingDTable_internal_bmi2_asm_loop(&args);
+    assert(args.ip[0] >=3D args.ilowest);
+    loopFn(&args);
=20
-    /* Our loop guarantees that ip[] >=3D ilimit and that we haven't
+    /* Our loop guarantees that ip[] >=3D ilowest and that we haven't
     * overwritten any op[].
     */
-    assert(args.ip[0] >=3D iend);
-    assert(args.ip[1] >=3D iend);
-    assert(args.ip[2] >=3D iend);
-    assert(args.ip[3] >=3D iend);
+    assert(args.ip[0] >=3D ilowest);
+    assert(args.ip[0] >=3D ilowest);
+    assert(args.ip[1] >=3D ilowest);
+    assert(args.ip[2] >=3D ilowest);
+    assert(args.ip[3] >=3D ilowest);
     assert(args.op[3] <=3D oend);
-    (void)iend;
+
+    assert(ilowest =3D=3D args.ilowest);
+    assert(ilowest + 6 =3D=3D args.iend[0]);
+    (void)ilowest;
=20
     /* finish bit streams one by one. */
-    {
-        size_t const segmentSize =3D (dstSize+3) / 4;
+    {   size_t const segmentSize =3D (dstSize+3) / 4;
         BYTE* segmentEnd =3D (BYTE*)dst;
         int i;
         for (i =3D 0; i < 4; ++i) {
@@ -712,97 +885,59 @@ HUF_decompress4X1_usingDTable_internal_bmi2_asm(
     }
=20
     /* decoded size */
+    assert(dstSize !=3D 0);
     return dstSize;
 }
-#endif /* ZSTD_ENABLE_ASM_X86_64_BMI2 */
-
-typedef size_t (*HUF_decompress_usingDTable_t)(void *dst, size_t dstSize,
-                                               const void *cSrc,
-                                               size_t cSrcSize,
-                                               const HUF_DTable *DTable);
=20
 HUF_DGEN(HUF_decompress1X1_usingDTable_internal)
=20
 static size_t HUF_decompress4X1_usingDTable_internal(void* dst, size_t dst=
Size, void const* cSrc,
-                    size_t cSrcSize, HUF_DTable const* DTable, int bmi2)
+                    size_t cSrcSize, HUF_DTable const* DTable, int flags)
 {
+    HUF_DecompressUsingDTableFn fallbackFn =3D HUF_decompress4X1_usingDTab=
le_internal_default;
+    HUF_DecompressFastLoopFn loopFn =3D HUF_decompress4X1_usingDTable_inte=
rnal_fast_c_loop;
+
 #if DYNAMIC_BMI2
-    if (bmi2) {
+    if (flags & HUF_flags_bmi2) {
+        fallbackFn =3D HUF_decompress4X1_usingDTable_internal_bmi2;
 # if ZSTD_ENABLE_ASM_X86_64_BMI2
-        return HUF_decompress4X1_usingDTable_internal_bmi2_asm(dst, dstSiz=
e, cSrc, cSrcSize, DTable);
-# else
-        return HUF_decompress4X1_usingDTable_internal_bmi2(dst, dstSize, c=
Src, cSrcSize, DTable);
+        if (!(flags & HUF_flags_disableAsm)) {
+            loopFn =3D HUF_decompress4X1_usingDTable_internal_fast_asm_loo=
p;
+        }
 # endif
+    } else {
+        return fallbackFn(dst, dstSize, cSrc, cSrcSize, DTable);
     }
-#else
-    (void)bmi2;
 #endif
=20
 #if ZSTD_ENABLE_ASM_X86_64_BMI2 && defined(__BMI2__)
-    return HUF_decompress4X1_usingDTable_internal_bmi2_asm(dst, dstSize, c=
Src, cSrcSize, DTable);
-#else
-    return HUF_decompress4X1_usingDTable_internal_default(dst, dstSize, cS=
rc, cSrcSize, DTable);
+    if (!(flags & HUF_flags_disableAsm)) {
+        loopFn =3D HUF_decompress4X1_usingDTable_internal_fast_asm_loop;
+    }
 #endif
-}
-
-
-size_t HUF_decompress1X1_usingDTable(
-          void* dst,  size_t dstSize,
-    const void* cSrc, size_t cSrcSize,
-    const HUF_DTable* DTable)
-{
-    DTableDesc dtd =3D HUF_getDTableDesc(DTable);
-    if (dtd.tableType !=3D 0) return ERROR(GENERIC);
-    return HUF_decompress1X1_usingDTable_internal(dst, dstSize, cSrc, cSrc=
Size, DTable, /* bmi2 */ 0);
-}
=20
-size_t HUF_decompress1X1_DCtx_wksp(HUF_DTable* DCtx, void* dst, size_t dst=
Size,
-                                   const void* cSrc, size_t cSrcSize,
-                                   void* workSpace, size_t wkspSize)
-{
-    const BYTE* ip =3D (const BYTE*) cSrc;
-
-    size_t const hSize =3D HUF_readDTableX1_wksp(DCtx, cSrc, cSrcSize, wor=
kSpace, wkspSize);
-    if (HUF_isError(hSize)) return hSize;
-    if (hSize >=3D cSrcSize) return ERROR(srcSize_wrong);
-    ip +=3D hSize; cSrcSize -=3D hSize;
-
-    return HUF_decompress1X1_usingDTable_internal(dst, dstSize, ip, cSrcSi=
ze, DCtx, /* bmi2 */ 0);
-}
-
-
-size_t HUF_decompress4X1_usingDTable(
-          void* dst,  size_t dstSize,
-    const void* cSrc, size_t cSrcSize,
-    const HUF_DTable* DTable)
-{
-    DTableDesc dtd =3D HUF_getDTableDesc(DTable);
-    if (dtd.tableType !=3D 0) return ERROR(GENERIC);
-    return HUF_decompress4X1_usingDTable_internal(dst, dstSize, cSrc, cSrc=
Size, DTable, /* bmi2 */ 0);
+    if (HUF_ENABLE_FAST_DECODE && !(flags & HUF_flags_disableFast)) {
+        size_t const ret =3D HUF_decompress4X1_usingDTable_internal_fast(d=
st, dstSize, cSrc, cSrcSize, DTable, loopFn);
+        if (ret !=3D 0)
+            return ret;
+    }
+    return fallbackFn(dst, dstSize, cSrc, cSrcSize, DTable);
 }
=20
-static size_t HUF_decompress4X1_DCtx_wksp_bmi2(HUF_DTable* dctx, void* dst=
, size_t dstSize,
+static size_t HUF_decompress4X1_DCtx_wksp(HUF_DTable* dctx, void* dst, siz=
e_t dstSize,
                                    const void* cSrc, size_t cSrcSize,
-                                   void* workSpace, size_t wkspSize, int b=
mi2)
+                                   void* workSpace, size_t wkspSize, int f=
lags)
 {
     const BYTE* ip =3D (const BYTE*) cSrc;
=20
-    size_t const hSize =3D HUF_readDTableX1_wksp_bmi2(dctx, cSrc, cSrcSize=
, workSpace, wkspSize, bmi2);
+    size_t const hSize =3D HUF_readDTableX1_wksp(dctx, cSrc, cSrcSize, wor=
kSpace, wkspSize, flags);
     if (HUF_isError(hSize)) return hSize;
     if (hSize >=3D cSrcSize) return ERROR(srcSize_wrong);
     ip +=3D hSize; cSrcSize -=3D hSize;
=20
-    return HUF_decompress4X1_usingDTable_internal(dst, dstSize, ip, cSrcSi=
ze, dctx, bmi2);
-}
-
-size_t HUF_decompress4X1_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dst=
Size,
-                                   const void* cSrc, size_t cSrcSize,
-                                   void* workSpace, size_t wkspSize)
-{
-    return HUF_decompress4X1_DCtx_wksp_bmi2(dctx, dst, dstSize, cSrc, cSrc=
Size, workSpace, wkspSize, 0);
+    return HUF_decompress4X1_usingDTable_internal(dst, dstSize, ip, cSrcSi=
ze, dctx, flags);
 }
=20
-
 #endif /* HUF_FORCE_DECOMPRESS_X2 */
=20
=20
@@ -985,7 +1120,7 @@ static void HUF_fillDTableX2Level2(HUF_DEltX2* DTable,=
 U32 targetLog, const U32
=20
 static void HUF_fillDTableX2(HUF_DEltX2* DTable, const U32 targetLog,
                            const sortedSymbol_t* sortedList,
-                           const U32* rankStart, rankValCol_t *rankValOrig=
in, const U32 maxWeight,
+                           const U32* rankStart, rankValCol_t* rankValOrig=
in, const U32 maxWeight,
                            const U32 nbBitsBaseline)
 {
     U32* const rankVal =3D rankValOrigin[0];
@@ -1040,14 +1175,7 @@ typedef struct {
=20
 size_t HUF_readDTableX2_wksp(HUF_DTable* DTable,
                        const void* src, size_t srcSize,
-                             void* workSpace, size_t wkspSize)
-{
-    return HUF_readDTableX2_wksp_bmi2(DTable, src, srcSize, workSpace, wks=
pSize, /* bmi2 */ 0);
-}
-
-size_t HUF_readDTableX2_wksp_bmi2(HUF_DTable* DTable,
-                       const void* src, size_t srcSize,
-                             void* workSpace, size_t wkspSize, int bmi2)
+                             void* workSpace, size_t wkspSize, int flags)
 {
     U32 tableLog, maxW, nbSymbols;
     DTableDesc dtd =3D HUF_getDTableDesc(DTable);
@@ -1069,7 +1197,7 @@ size_t HUF_readDTableX2_wksp_bmi2(HUF_DTable* DTable,
     if (maxTableLog > HUF_TABLELOG_MAX) return ERROR(tableLog_tooLarge);
     /* ZSTD_memset(weightList, 0, sizeof(weightList)); */  /* is not neces=
sary, even though some analyzer complain ... */
=20
-    iSize =3D HUF_readStats_wksp(wksp->weightList, HUF_SYMBOLVALUE_MAX + 1=
, wksp->rankStats, &nbSymbols, &tableLog, src, srcSize, wksp->calleeWksp, s=
izeof(wksp->calleeWksp), bmi2);
+    iSize =3D HUF_readStats_wksp(wksp->weightList, HUF_SYMBOLVALUE_MAX + 1=
, wksp->rankStats, &nbSymbols, &tableLog, src, srcSize, wksp->calleeWksp, s=
izeof(wksp->calleeWksp), flags);
     if (HUF_isError(iSize)) return iSize;
=20
     /* check result */
@@ -1159,15 +1287,19 @@ HUF_decodeLastSymbolX2(void* op, BIT_DStream_t* DSt=
ream, const HUF_DEltX2* dt, c
 }
=20
 #define HUF_DECODE_SYMBOLX2_0(ptr, DStreamPtr) \
-    ptr +=3D HUF_decodeSymbolX2(ptr, DStreamPtr, dt, dtLog)
+    do { ptr +=3D HUF_decodeSymbolX2(ptr, DStreamPtr, dt, dtLog); } while =
(0)
=20
-#define HUF_DECODE_SYMBOLX2_1(ptr, DStreamPtr) \
-    if (MEM_64bits() || (HUF_TABLELOG_MAX<=3D12)) \
-        ptr +=3D HUF_decodeSymbolX2(ptr, DStreamPtr, dt, dtLog)
+#define HUF_DECODE_SYMBOLX2_1(ptr, DStreamPtr)                     \
+    do {                                                           \
+        if (MEM_64bits() || (HUF_TABLELOG_MAX<=3D12))                \
+            ptr +=3D HUF_decodeSymbolX2(ptr, DStreamPtr, dt, dtLog); \
+    } while (0)
=20
-#define HUF_DECODE_SYMBOLX2_2(ptr, DStreamPtr) \
-    if (MEM_64bits()) \
-        ptr +=3D HUF_decodeSymbolX2(ptr, DStreamPtr, dt, dtLog)
+#define HUF_DECODE_SYMBOLX2_2(ptr, DStreamPtr)                     \
+    do {                                                           \
+        if (MEM_64bits())                                          \
+            ptr +=3D HUF_decodeSymbolX2(ptr, DStreamPtr, dt, dtLog); \
+    } while (0)
=20
 HINT_INLINE size_t
 HUF_decodeStreamX2(BYTE* p, BIT_DStream_t* bitDPtr, BYTE* const pEnd,
@@ -1227,7 +1359,7 @@ HUF_decompress1X2_usingDTable_internal_body(
=20
     /* decode */
     {   BYTE* const ostart =3D (BYTE*) dst;
-        BYTE* const oend =3D ostart + dstSize;
+        BYTE* const oend =3D ZSTD_maybeNullPtrAdd(ostart, dstSize);
         const void* const dtPtr =3D DTable+1;   /* force compiler to not u=
se strict-aliasing */
         const HUF_DEltX2* const dt =3D (const HUF_DEltX2*)dtPtr;
         DTableDesc const dtd =3D HUF_getDTableDesc(DTable);
@@ -1240,6 +1372,11 @@ HUF_decompress1X2_usingDTable_internal_body(
     /* decoded size */
     return dstSize;
 }
+
+/* HUF_decompress4X2_usingDTable_internal_body():
+ * Conditions:
+ * @dstSize >=3D 6
+ */
 FORCE_INLINE_TEMPLATE size_t
 HUF_decompress4X2_usingDTable_internal_body(
           void* dst,  size_t dstSize,
@@ -1247,6 +1384,7 @@ HUF_decompress4X2_usingDTable_internal_body(
     const HUF_DTable* DTable)
 {
     if (cSrcSize < 10) return ERROR(corruption_detected);   /* strict mini=
mum : jump table + 1 byte per stream */
+    if (dstSize < 6) return ERROR(corruption_detected);         /* stream =
4-split doesn't work */
=20
     {   const BYTE* const istart =3D (const BYTE*) cSrc;
         BYTE* const ostart =3D (BYTE*) dst;
@@ -1280,8 +1418,9 @@ HUF_decompress4X2_usingDTable_internal_body(
         DTableDesc const dtd =3D HUF_getDTableDesc(DTable);
         U32 const dtLog =3D dtd.tableLog;
=20
-        if (length4 > cSrcSize) return ERROR(corruption_detected);   /* ov=
erflow */
-        if (opStart4 > oend) return ERROR(corruption_detected);      /* ov=
erflow */
+        if (length4 > cSrcSize) return ERROR(corruption_detected);  /* ove=
rflow */
+        if (opStart4 > oend) return ERROR(corruption_detected);     /* ove=
rflow */
+        assert(dstSize >=3D 6 /* validated above */);
         CHECK_F( BIT_initDStream(&bitD1, istart1, length1) );
         CHECK_F( BIT_initDStream(&bitD2, istart2, length2) );
         CHECK_F( BIT_initDStream(&bitD3, istart3, length3) );
@@ -1366,44 +1505,191 @@ size_t HUF_decompress4X2_usingDTable_internal_bmi2=
(void* dst, size_t dstSize, vo
 }
 #endif
=20
-#if HUF_NEED_DEFAULT_FUNCTION
 static
 size_t HUF_decompress4X2_usingDTable_internal_default(void* dst, size_t ds=
tSize, void const* cSrc,
                     size_t cSrcSize, HUF_DTable const* DTable) {
     return HUF_decompress4X2_usingDTable_internal_body(dst, dstSize, cSrc,=
 cSrcSize, DTable);
 }
-#endif
=20
 #if ZSTD_ENABLE_ASM_X86_64_BMI2
=20
-HUF_ASM_DECL void HUF_decompress4X2_usingDTable_internal_bmi2_asm_loop(HUF=
_DecompressAsmArgs* args) ZSTDLIB_HIDDEN;
+HUF_ASM_DECL void HUF_decompress4X2_usingDTable_internal_fast_asm_loop(HUF=
_DecompressFastArgs* args) ZSTDLIB_HIDDEN;
+
+#endif
+
+static HUF_FAST_BMI2_ATTRS
+void HUF_decompress4X2_usingDTable_internal_fast_c_loop(HUF_DecompressFast=
Args* args)
+{
+    U64 bits[4];
+    BYTE const* ip[4];
+    BYTE* op[4];
+    BYTE* oend[4];
+    HUF_DEltX2 const* const dtable =3D (HUF_DEltX2 const*)args->dt;
+    BYTE const* const ilowest =3D args->ilowest;
+
+    /* Copy the arguments to local registers. */
+    ZSTD_memcpy(&bits, &args->bits, sizeof(bits));
+    ZSTD_memcpy((void*)(&ip), &args->ip, sizeof(ip));
+    ZSTD_memcpy(&op, &args->op, sizeof(op));
+
+    oend[0] =3D op[1];
+    oend[1] =3D op[2];
+    oend[2] =3D op[3];
+    oend[3] =3D args->oend;
+
+    assert(MEM_isLittleEndian());
+    assert(!MEM_32bits());
+
+    for (;;) {
+        BYTE* olimit;
+        int stream;
+
+        /* Assert loop preconditions */
+#ifndef NDEBUG
+        for (stream =3D 0; stream < 4; ++stream) {
+            assert(op[stream] <=3D oend[stream]);
+            assert(ip[stream] >=3D ilowest);
+        }
+#endif
+        /* Compute olimit */
+        {
+            /* Each loop does 5 table lookups for each of the 4 streams.
+             * Each table lookup consumes up to 11 bits of input, and prod=
uces
+             * up to 2 bytes of output.
+             */
+            /* We can consume up to 7 bytes of input per iteration per str=
eam.
+             * We also know that each input pointer is >=3D ip[0]. So we c=
an run
+             * iters loops before running out of input.
+             */
+            size_t iters =3D (size_t)(ip[0] - ilowest) / 7;
+            /* Each iteration can produce up to 10 bytes of output per str=
eam.
+             * Each output stream my advance at different rates. So take t=
he
+             * minimum number of safe iterations among all the output stre=
ams.
+             */
+            for (stream =3D 0; stream < 4; ++stream) {
+                size_t const oiters =3D (size_t)(oend[stream] - op[stream]=
) / 10;
+                iters =3D MIN(iters, oiters);
+            }
+
+            /* Each iteration produces at least 5 output symbols. So until
+             * op[3] crosses olimit, we know we haven't executed iters
+             * iterations yet. This saves us maintaining an iters counter,
+             * at the expense of computing the remaining # of iterations
+             * more frequently.
+             */
+            olimit =3D op[3] + (iters * 5);
+
+            /* Exit the fast decoding loop once we reach the end. */
+            if (op[3] =3D=3D olimit)
+                break;
+
+            /* Exit the decoding loop if any input pointer has crossed the
+             * previous one. This indicates corruption, and a precondition
+             * to our loop is that ip[i] >=3D ip[0].
+             */
+            for (stream =3D 1; stream < 4; ++stream) {
+                if (ip[stream] < ip[stream - 1])
+                    goto _out;
+            }
+        }
+
+#ifndef NDEBUG
+        for (stream =3D 1; stream < 4; ++stream) {
+            assert(ip[stream] >=3D ip[stream - 1]);
+        }
+#endif
=20
-static HUF_ASM_X86_64_BMI2_ATTRS size_t
-HUF_decompress4X2_usingDTable_internal_bmi2_asm(
+#define HUF_4X2_DECODE_SYMBOL(_stream, _decode3)                      \
+    do {                                                              \
+        if ((_decode3) || (_stream) !=3D 3) {                           \
+            int const index =3D (int)(bits[(_stream)] >> 53);           \
+            HUF_DEltX2 const entry =3D dtable[index];                   \
+            MEM_write16(op[(_stream)], entry.sequence); \
+            bits[(_stream)] <<=3D (entry.nbBits) & 0x3F;                \
+            op[(_stream)] +=3D (entry.length);                          \
+        }                                                             \
+    } while (0)
+
+#define HUF_4X2_RELOAD_STREAM(_stream)                                  \
+    do {                                                                \
+        HUF_4X2_DECODE_SYMBOL(3, 1);                                    \
+        {                                                               \
+            int const ctz =3D ZSTD_countTrailingZeros64(bits[(_stream)]); \
+            int const nbBits =3D ctz & 7;                                 \
+            int const nbBytes =3D ctz >> 3;                               \
+            ip[(_stream)] -=3D nbBytes;                                   \
+            bits[(_stream)] =3D MEM_read64(ip[(_stream)]) | 1;            \
+            bits[(_stream)] <<=3D nbBits;                                 \
+        }                                                               \
+    } while (0)
+
+        /* Manually unroll the loop because compilers don't consistently
+         * unroll the inner loops, which destroys performance.
+         */
+        do {
+            /* Decode 5 symbols from each of the first 3 streams.
+             * The final stream will be decoded during the reload phase
+             * to reduce register pressure.
+             */
+            HUF_4X_FOR_EACH_STREAM_WITH_VAR(HUF_4X2_DECODE_SYMBOL, 0);
+            HUF_4X_FOR_EACH_STREAM_WITH_VAR(HUF_4X2_DECODE_SYMBOL, 0);
+            HUF_4X_FOR_EACH_STREAM_WITH_VAR(HUF_4X2_DECODE_SYMBOL, 0);
+            HUF_4X_FOR_EACH_STREAM_WITH_VAR(HUF_4X2_DECODE_SYMBOL, 0);
+            HUF_4X_FOR_EACH_STREAM_WITH_VAR(HUF_4X2_DECODE_SYMBOL, 0);
+
+            /* Decode one symbol from the final stream */
+            HUF_4X2_DECODE_SYMBOL(3, 1);
+
+            /* Decode 4 symbols from the final stream & reload bitstreams.
+             * The final stream is reloaded last, meaning that all 5 symbo=
ls
+             * are decoded from the final stream before it is reloaded.
+             */
+            HUF_4X_FOR_EACH_STREAM(HUF_4X2_RELOAD_STREAM);
+        } while (op[3] < olimit);
+    }
+
+#undef HUF_4X2_DECODE_SYMBOL
+#undef HUF_4X2_RELOAD_STREAM
+
+_out:
+
+    /* Save the final values of each of the state variables back to args. =
*/
+    ZSTD_memcpy(&args->bits, &bits, sizeof(bits));
+    ZSTD_memcpy((void*)(&args->ip), &ip, sizeof(ip));
+    ZSTD_memcpy(&args->op, &op, sizeof(op));
+}
+
+
+static HUF_FAST_BMI2_ATTRS size_t
+HUF_decompress4X2_usingDTable_internal_fast(
           void* dst,  size_t dstSize,
     const void* cSrc, size_t cSrcSize,
-    const HUF_DTable* DTable) {
+    const HUF_DTable* DTable,
+    HUF_DecompressFastLoopFn loopFn) {
     void const* dt =3D DTable + 1;
-    const BYTE* const iend =3D (const BYTE*)cSrc + 6;
-    BYTE* const oend =3D (BYTE*)dst + dstSize;
-    HUF_DecompressAsmArgs args;
+    const BYTE* const ilowest =3D (const BYTE*)cSrc;
+    BYTE* const oend =3D ZSTD_maybeNullPtrAdd((BYTE*)dst, dstSize);
+    HUF_DecompressFastArgs args;
     {
-        size_t const ret =3D HUF_DecompressAsmArgs_init(&args, dst, dstSiz=
e, cSrc, cSrcSize, DTable);
+        size_t const ret =3D HUF_DecompressFastArgs_init(&args, dst, dstSi=
ze, cSrc, cSrcSize, DTable);
         FORWARD_IF_ERROR(ret, "Failed to init asm args");
-        if (ret !=3D 0)
-            return HUF_decompress4X2_usingDTable_internal_bmi2(dst, dstSiz=
e, cSrc, cSrcSize, DTable);
+        if (ret =3D=3D 0)
+            return 0;
     }
=20
-    assert(args.ip[0] >=3D args.ilimit);
-    HUF_decompress4X2_usingDTable_internal_bmi2_asm_loop(&args);
+    assert(args.ip[0] >=3D args.ilowest);
+    loopFn(&args);
=20
     /* note : op4 already verified within main loop */
-    assert(args.ip[0] >=3D iend);
-    assert(args.ip[1] >=3D iend);
-    assert(args.ip[2] >=3D iend);
-    assert(args.ip[3] >=3D iend);
+    assert(args.ip[0] >=3D ilowest);
+    assert(args.ip[1] >=3D ilowest);
+    assert(args.ip[2] >=3D ilowest);
+    assert(args.ip[3] >=3D ilowest);
     assert(args.op[3] <=3D oend);
-    (void)iend;
+
+    assert(ilowest =3D=3D args.ilowest);
+    assert(ilowest + 6 =3D=3D args.iend[0]);
+    (void)ilowest;
=20
     /* finish bitStreams one by one */
     {
@@ -1426,91 +1712,72 @@ HUF_decompress4X2_usingDTable_internal_bmi2_asm(
     /* decoded size */
     return dstSize;
 }
-#endif /* ZSTD_ENABLE_ASM_X86_64_BMI2 */
=20
 static size_t HUF_decompress4X2_usingDTable_internal(void* dst, size_t dst=
Size, void const* cSrc,
-                    size_t cSrcSize, HUF_DTable const* DTable, int bmi2)
+                    size_t cSrcSize, HUF_DTable const* DTable, int flags)
 {
+    HUF_DecompressUsingDTableFn fallbackFn =3D HUF_decompress4X2_usingDTab=
le_internal_default;
+    HUF_DecompressFastLoopFn loopFn =3D HUF_decompress4X2_usingDTable_inte=
rnal_fast_c_loop;
+
 #if DYNAMIC_BMI2
-    if (bmi2) {
+    if (flags & HUF_flags_bmi2) {
+        fallbackFn =3D HUF_decompress4X2_usingDTable_internal_bmi2;
 # if ZSTD_ENABLE_ASM_X86_64_BMI2
-        return HUF_decompress4X2_usingDTable_internal_bmi2_asm(dst, dstSiz=
e, cSrc, cSrcSize, DTable);
-# else
-        return HUF_decompress4X2_usingDTable_internal_bmi2(dst, dstSize, c=
Src, cSrcSize, DTable);
+        if (!(flags & HUF_flags_disableAsm)) {
+            loopFn =3D HUF_decompress4X2_usingDTable_internal_fast_asm_loo=
p;
+        }
 # endif
+    } else {
+        return fallbackFn(dst, dstSize, cSrc, cSrcSize, DTable);
     }
-#else
-    (void)bmi2;
 #endif
=20
 #if ZSTD_ENABLE_ASM_X86_64_BMI2 && defined(__BMI2__)
-    return HUF_decompress4X2_usingDTable_internal_bmi2_asm(dst, dstSize, c=
Src, cSrcSize, DTable);
-#else
-    return HUF_decompress4X2_usingDTable_internal_default(dst, dstSize, cS=
rc, cSrcSize, DTable);
+    if (!(flags & HUF_flags_disableAsm)) {
+        loopFn =3D HUF_decompress4X2_usingDTable_internal_fast_asm_loop;
+    }
 #endif
+
+    if (HUF_ENABLE_FAST_DECODE && !(flags & HUF_flags_disableFast)) {
+        size_t const ret =3D HUF_decompress4X2_usingDTable_internal_fast(d=
st, dstSize, cSrc, cSrcSize, DTable, loopFn);
+        if (ret !=3D 0)
+            return ret;
+    }
+    return fallbackFn(dst, dstSize, cSrc, cSrcSize, DTable);
 }
=20
 HUF_DGEN(HUF_decompress1X2_usingDTable_internal)
=20
-size_t HUF_decompress1X2_usingDTable(
-          void* dst,  size_t dstSize,
-    const void* cSrc, size_t cSrcSize,
-    const HUF_DTable* DTable)
-{
-    DTableDesc dtd =3D HUF_getDTableDesc(DTable);
-    if (dtd.tableType !=3D 1) return ERROR(GENERIC);
-    return HUF_decompress1X2_usingDTable_internal(dst, dstSize, cSrc, cSrc=
Size, DTable, /* bmi2 */ 0);
-}
-
 size_t HUF_decompress1X2_DCtx_wksp(HUF_DTable* DCtx, void* dst, size_t dst=
Size,
                                    const void* cSrc, size_t cSrcSize,
-                                   void* workSpace, size_t wkspSize)
+                                   void* workSpace, size_t wkspSize, int f=
lags)
 {
     const BYTE* ip =3D (const BYTE*) cSrc;
=20
     size_t const hSize =3D HUF_readDTableX2_wksp(DCtx, cSrc, cSrcSize,
-                                               workSpace, wkspSize);
+                                               workSpace, wkspSize, flags);
     if (HUF_isError(hSize)) return hSize;
     if (hSize >=3D cSrcSize) return ERROR(srcSize_wrong);
     ip +=3D hSize; cSrcSize -=3D hSize;
=20
-    return HUF_decompress1X2_usingDTable_internal(dst, dstSize, ip, cSrcSi=
ze, DCtx, /* bmi2 */ 0);
+    return HUF_decompress1X2_usingDTable_internal(dst, dstSize, ip, cSrcSi=
ze, DCtx, flags);
 }
=20
-
-size_t HUF_decompress4X2_usingDTable(
-          void* dst,  size_t dstSize,
-    const void* cSrc, size_t cSrcSize,
-    const HUF_DTable* DTable)
-{
-    DTableDesc dtd =3D HUF_getDTableDesc(DTable);
-    if (dtd.tableType !=3D 1) return ERROR(GENERIC);
-    return HUF_decompress4X2_usingDTable_internal(dst, dstSize, cSrc, cSrc=
Size, DTable, /* bmi2 */ 0);
-}
-
-static size_t HUF_decompress4X2_DCtx_wksp_bmi2(HUF_DTable* dctx, void* dst=
, size_t dstSize,
+static size_t HUF_decompress4X2_DCtx_wksp(HUF_DTable* dctx, void* dst, siz=
e_t dstSize,
                                    const void* cSrc, size_t cSrcSize,
-                                   void* workSpace, size_t wkspSize, int b=
mi2)
+                                   void* workSpace, size_t wkspSize, int f=
lags)
 {
     const BYTE* ip =3D (const BYTE*) cSrc;
=20
     size_t hSize =3D HUF_readDTableX2_wksp(dctx, cSrc, cSrcSize,
-                                         workSpace, wkspSize);
+                                         workSpace, wkspSize, flags);
     if (HUF_isError(hSize)) return hSize;
     if (hSize >=3D cSrcSize) return ERROR(srcSize_wrong);
     ip +=3D hSize; cSrcSize -=3D hSize;
=20
-    return HUF_decompress4X2_usingDTable_internal(dst, dstSize, ip, cSrcSi=
ze, dctx, bmi2);
+    return HUF_decompress4X2_usingDTable_internal(dst, dstSize, ip, cSrcSi=
ze, dctx, flags);
 }
=20
-size_t HUF_decompress4X2_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dst=
Size,
-                                   const void* cSrc, size_t cSrcSize,
-                                   void* workSpace, size_t wkspSize)
-{
-    return HUF_decompress4X2_DCtx_wksp_bmi2(dctx, dst, dstSize, cSrc, cSrc=
Size, workSpace, wkspSize, /* bmi2 */ 0);
-}
-
-
 #endif /* HUF_FORCE_DECOMPRESS_X1 */
=20
=20
@@ -1518,44 +1785,6 @@ size_t HUF_decompress4X2_DCtx_wksp(HUF_DTable* dctx,=
 void* dst, size_t dstSize,
 /* Universal decompression selectors */
 /* ***********************************/
=20
-size_t HUF_decompress1X_usingDTable(void* dst, size_t maxDstSize,
-                                    const void* cSrc, size_t cSrcSize,
-                                    const HUF_DTable* DTable)
-{
-    DTableDesc const dtd =3D HUF_getDTableDesc(DTable);
-#if defined(HUF_FORCE_DECOMPRESS_X1)
-    (void)dtd;
-    assert(dtd.tableType =3D=3D 0);
-    return HUF_decompress1X1_usingDTable_internal(dst, maxDstSize, cSrc, c=
SrcSize, DTable, /* bmi2 */ 0);
-#elif defined(HUF_FORCE_DECOMPRESS_X2)
-    (void)dtd;
-    assert(dtd.tableType =3D=3D 1);
-    return HUF_decompress1X2_usingDTable_internal(dst, maxDstSize, cSrc, c=
SrcSize, DTable, /* bmi2 */ 0);
-#else
-    return dtd.tableType ? HUF_decompress1X2_usingDTable_internal(dst, max=
DstSize, cSrc, cSrcSize, DTable, /* bmi2 */ 0) :
-                           HUF_decompress1X1_usingDTable_internal(dst, max=
DstSize, cSrc, cSrcSize, DTable, /* bmi2 */ 0);
-#endif
-}
-
-size_t HUF_decompress4X_usingDTable(void* dst, size_t maxDstSize,
-                                    const void* cSrc, size_t cSrcSize,
-                                    const HUF_DTable* DTable)
-{
-    DTableDesc const dtd =3D HUF_getDTableDesc(DTable);
-#if defined(HUF_FORCE_DECOMPRESS_X1)
-    (void)dtd;
-    assert(dtd.tableType =3D=3D 0);
-    return HUF_decompress4X1_usingDTable_internal(dst, maxDstSize, cSrc, c=
SrcSize, DTable, /* bmi2 */ 0);
-#elif defined(HUF_FORCE_DECOMPRESS_X2)
-    (void)dtd;
-    assert(dtd.tableType =3D=3D 1);
-    return HUF_decompress4X2_usingDTable_internal(dst, maxDstSize, cSrc, c=
SrcSize, DTable, /* bmi2 */ 0);
-#else
-    return dtd.tableType ? HUF_decompress4X2_usingDTable_internal(dst, max=
DstSize, cSrc, cSrcSize, DTable, /* bmi2 */ 0) :
-                           HUF_decompress4X1_usingDTable_internal(dst, max=
DstSize, cSrc, cSrcSize, DTable, /* bmi2 */ 0);
-#endif
-}
-
=20
 #if !defined(HUF_FORCE_DECOMPRESS_X1) && !defined(HUF_FORCE_DECOMPRESS_X2)
 typedef struct { U32 tableTime; U32 decode256Time; } algo_time_t;
@@ -1610,36 +1839,9 @@ U32 HUF_selectDecoder (size_t dstSize, size_t cSrcSi=
ze)
 #endif
 }
=20
-
-size_t HUF_decompress4X_hufOnly_wksp(HUF_DTable* dctx, void* dst,
-                                     size_t dstSize, const void* cSrc,
-                                     size_t cSrcSize, void* workSpace,
-                                     size_t wkspSize)
-{
-    /* validation checks */
-    if (dstSize =3D=3D 0) return ERROR(dstSize_tooSmall);
-    if (cSrcSize =3D=3D 0) return ERROR(corruption_detected);
-
-    {   U32 const algoNb =3D HUF_selectDecoder(dstSize, cSrcSize);
-#if defined(HUF_FORCE_DECOMPRESS_X1)
-        (void)algoNb;
-        assert(algoNb =3D=3D 0);
-        return HUF_decompress4X1_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcS=
ize, workSpace, wkspSize);
-#elif defined(HUF_FORCE_DECOMPRESS_X2)
-        (void)algoNb;
-        assert(algoNb =3D=3D 1);
-        return HUF_decompress4X2_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcS=
ize, workSpace, wkspSize);
-#else
-        return algoNb ? HUF_decompress4X2_DCtx_wksp(dctx, dst, dstSize, cS=
rc,
-                            cSrcSize, workSpace, wkspSize):
-                        HUF_decompress4X1_DCtx_wksp(dctx, dst, dstSize, cS=
rc, cSrcSize, workSpace, wkspSize);
-#endif
-    }
-}
-
 size_t HUF_decompress1X_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dstS=
ize,
                                   const void* cSrc, size_t cSrcSize,
-                                  void* workSpace, size_t wkspSize)
+                                  void* workSpace, size_t wkspSize, int fl=
ags)
 {
     /* validation checks */
     if (dstSize =3D=3D 0) return ERROR(dstSize_tooSmall);
@@ -1652,71 +1854,71 @@ size_t HUF_decompress1X_DCtx_wksp(HUF_DTable* dctx,=
 void* dst, size_t dstSize,
         (void)algoNb;
         assert(algoNb =3D=3D 0);
         return HUF_decompress1X1_DCtx_wksp(dctx, dst, dstSize, cSrc,
-                                cSrcSize, workSpace, wkspSize);
+                                cSrcSize, workSpace, wkspSize, flags);
 #elif defined(HUF_FORCE_DECOMPRESS_X2)
         (void)algoNb;
         assert(algoNb =3D=3D 1);
         return HUF_decompress1X2_DCtx_wksp(dctx, dst, dstSize, cSrc,
-                                cSrcSize, workSpace, wkspSize);
+                                cSrcSize, workSpace, wkspSize, flags);
 #else
         return algoNb ? HUF_decompress1X2_DCtx_wksp(dctx, dst, dstSize, cS=
rc,
-                                cSrcSize, workSpace, wkspSize):
+                                cSrcSize, workSpace, wkspSize, flags):
                         HUF_decompress1X1_DCtx_wksp(dctx, dst, dstSize, cS=
rc,
-                                cSrcSize, workSpace, wkspSize);
+                                cSrcSize, workSpace, wkspSize, flags);
 #endif
     }
 }
=20
=20
-size_t HUF_decompress1X_usingDTable_bmi2(void* dst, size_t maxDstSize, con=
st void* cSrc, size_t cSrcSize, const HUF_DTable* DTable, int bmi2)
+size_t HUF_decompress1X_usingDTable(void* dst, size_t maxDstSize, const vo=
id* cSrc, size_t cSrcSize, const HUF_DTable* DTable, int flags)
 {
     DTableDesc const dtd =3D HUF_getDTableDesc(DTable);
 #if defined(HUF_FORCE_DECOMPRESS_X1)
     (void)dtd;
     assert(dtd.tableType =3D=3D 0);
-    return HUF_decompress1X1_usingDTable_internal(dst, maxDstSize, cSrc, c=
SrcSize, DTable, bmi2);
+    return HUF_decompress1X1_usingDTable_internal(dst, maxDstSize, cSrc, c=
SrcSize, DTable, flags);
 #elif defined(HUF_FORCE_DECOMPRESS_X2)
     (void)dtd;
     assert(dtd.tableType =3D=3D 1);
-    return HUF_decompress1X2_usingDTable_internal(dst, maxDstSize, cSrc, c=
SrcSize, DTable, bmi2);
+    return HUF_decompress1X2_usingDTable_internal(dst, maxDstSize, cSrc, c=
SrcSize, DTable, flags);
 #else
-    return dtd.tableType ? HUF_decompress1X2_usingDTable_internal(dst, max=
DstSize, cSrc, cSrcSize, DTable, bmi2) :
-                           HUF_decompress1X1_usingDTable_internal(dst, max=
DstSize, cSrc, cSrcSize, DTable, bmi2);
+    return dtd.tableType ? HUF_decompress1X2_usingDTable_internal(dst, max=
DstSize, cSrc, cSrcSize, DTable, flags) :
+                           HUF_decompress1X1_usingDTable_internal(dst, max=
DstSize, cSrc, cSrcSize, DTable, flags);
 #endif
 }
=20
 #ifndef HUF_FORCE_DECOMPRESS_X2
-size_t HUF_decompress1X1_DCtx_wksp_bmi2(HUF_DTable* dctx, void* dst, size_=
t dstSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspS=
ize, int bmi2)
+size_t HUF_decompress1X1_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dst=
Size, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize, =
int flags)
 {
     const BYTE* ip =3D (const BYTE*) cSrc;
=20
-    size_t const hSize =3D HUF_readDTableX1_wksp_bmi2(dctx, cSrc, cSrcSize=
, workSpace, wkspSize, bmi2);
+    size_t const hSize =3D HUF_readDTableX1_wksp(dctx, cSrc, cSrcSize, wor=
kSpace, wkspSize, flags);
     if (HUF_isError(hSize)) return hSize;
     if (hSize >=3D cSrcSize) return ERROR(srcSize_wrong);
     ip +=3D hSize; cSrcSize -=3D hSize;
=20
-    return HUF_decompress1X1_usingDTable_internal(dst, dstSize, ip, cSrcSi=
ze, dctx, bmi2);
+    return HUF_decompress1X1_usingDTable_internal(dst, dstSize, ip, cSrcSi=
ze, dctx, flags);
 }
 #endif
=20
-size_t HUF_decompress4X_usingDTable_bmi2(void* dst, size_t maxDstSize, con=
st void* cSrc, size_t cSrcSize, const HUF_DTable* DTable, int bmi2)
+size_t HUF_decompress4X_usingDTable(void* dst, size_t maxDstSize, const vo=
id* cSrc, size_t cSrcSize, const HUF_DTable* DTable, int flags)
 {
     DTableDesc const dtd =3D HUF_getDTableDesc(DTable);
 #if defined(HUF_FORCE_DECOMPRESS_X1)
     (void)dtd;
     assert(dtd.tableType =3D=3D 0);
-    return HUF_decompress4X1_usingDTable_internal(dst, maxDstSize, cSrc, c=
SrcSize, DTable, bmi2);
+    return HUF_decompress4X1_usingDTable_internal(dst, maxDstSize, cSrc, c=
SrcSize, DTable, flags);
 #elif defined(HUF_FORCE_DECOMPRESS_X2)
     (void)dtd;
     assert(dtd.tableType =3D=3D 1);
-    return HUF_decompress4X2_usingDTable_internal(dst, maxDstSize, cSrc, c=
SrcSize, DTable, bmi2);
+    return HUF_decompress4X2_usingDTable_internal(dst, maxDstSize, cSrc, c=
SrcSize, DTable, flags);
 #else
-    return dtd.tableType ? HUF_decompress4X2_usingDTable_internal(dst, max=
DstSize, cSrc, cSrcSize, DTable, bmi2) :
-                           HUF_decompress4X1_usingDTable_internal(dst, max=
DstSize, cSrc, cSrcSize, DTable, bmi2);
+    return dtd.tableType ? HUF_decompress4X2_usingDTable_internal(dst, max=
DstSize, cSrc, cSrcSize, DTable, flags) :
+                           HUF_decompress4X1_usingDTable_internal(dst, max=
DstSize, cSrc, cSrcSize, DTable, flags);
 #endif
 }
=20
-size_t HUF_decompress4X_hufOnly_wksp_bmi2(HUF_DTable* dctx, void* dst, siz=
e_t dstSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wks=
pSize, int bmi2)
+size_t HUF_decompress4X_hufOnly_wksp(HUF_DTable* dctx, void* dst, size_t d=
stSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize=
, int flags)
 {
     /* validation checks */
     if (dstSize =3D=3D 0) return ERROR(dstSize_tooSmall);
@@ -1726,15 +1928,14 @@ size_t HUF_decompress4X_hufOnly_wksp_bmi2(HUF_DTabl=
e* dctx, void* dst, size_t ds
 #if defined(HUF_FORCE_DECOMPRESS_X1)
         (void)algoNb;
         assert(algoNb =3D=3D 0);
-        return HUF_decompress4X1_DCtx_wksp_bmi2(dctx, dst, dstSize, cSrc, =
cSrcSize, workSpace, wkspSize, bmi2);
+        return HUF_decompress4X1_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcS=
ize, workSpace, wkspSize, flags);
 #elif defined(HUF_FORCE_DECOMPRESS_X2)
         (void)algoNb;
         assert(algoNb =3D=3D 1);
-        return HUF_decompress4X2_DCtx_wksp_bmi2(dctx, dst, dstSize, cSrc, =
cSrcSize, workSpace, wkspSize, bmi2);
+        return HUF_decompress4X2_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcS=
ize, workSpace, wkspSize, flags);
 #else
-        return algoNb ? HUF_decompress4X2_DCtx_wksp_bmi2(dctx, dst, dstSiz=
e, cSrc, cSrcSize, workSpace, wkspSize, bmi2) :
-                        HUF_decompress4X1_DCtx_wksp_bmi2(dctx, dst, dstSiz=
e, cSrc, cSrcSize, workSpace, wkspSize, bmi2);
+        return algoNb ? HUF_decompress4X2_DCtx_wksp(dctx, dst, dstSize, cS=
rc, cSrcSize, workSpace, wkspSize, flags) :
+                        HUF_decompress4X1_DCtx_wksp(dctx, dst, dstSize, cS=
rc, cSrcSize, workSpace, wkspSize, flags);
 #endif
     }
 }
-
diff --git a/lib/zstd/decompress/zstd_ddict.c b/lib/zstd/decompress/zstd_dd=
ict.c
index dbbc7919de53..30ef65e1ab5c 100644
--- a/lib/zstd/decompress/zstd_ddict.c
+++ b/lib/zstd/decompress/zstd_ddict.c
@@ -1,5 +1,6 @@
+// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -14,12 +15,12 @@
 /*-*******************************************************
 *  Dependencies
 *********************************************************/
+#include "../common/allocations.h"  /* ZSTD_customMalloc, ZSTD_customFree =
*/
 #include "../common/zstd_deps.h"   /* ZSTD_memcpy, ZSTD_memmove, ZSTD_mems=
et */
 #include "../common/cpu.h"         /* bmi2 */
 #include "../common/mem.h"         /* low level memory routines */
 #define FSE_STATIC_LINKING_ONLY
 #include "../common/fse.h"
-#define HUF_STATIC_LINKING_ONLY
 #include "../common/huf.h"
 #include "zstd_decompress_internal.h"
 #include "zstd_ddict.h"
@@ -131,7 +132,7 @@ static size_t ZSTD_initDDict_internal(ZSTD_DDict* ddict,
         ZSTD_memcpy(internalBuffer, dict, dictSize);
     }
     ddict->dictSize =3D dictSize;
-    ddict->entropy.hufTable[0] =3D (HUF_DTable)((HufLog)*0x1000001);  /* c=
over both little and big endian */
+    ddict->entropy.hufTable[0] =3D (HUF_DTable)((ZSTD_HUFFDTABLE_CAPACITY_=
LOG)*0x1000001);  /* cover both little and big endian */
=20
     /* parse dictionary content */
     FORWARD_IF_ERROR( ZSTD_loadEntropy_intoDDict(ddict, dictContentType) ,=
 "");
@@ -237,5 +238,5 @@ size_t ZSTD_sizeof_DDict(const ZSTD_DDict* ddict)
 unsigned ZSTD_getDictID_fromDDict(const ZSTD_DDict* ddict)
 {
     if (ddict=3D=3DNULL) return 0;
-    return ZSTD_getDictID_fromDict(ddict->dictContent, ddict->dictSize);
+    return ddict->dictID;
 }
diff --git a/lib/zstd/decompress/zstd_ddict.h b/lib/zstd/decompress/zstd_dd=
ict.h
index 8c1a79d666f8..de459a0dacd1 100644
--- a/lib/zstd/decompress/zstd_ddict.h
+++ b/lib/zstd/decompress/zstd_ddict.h
@@ -1,5 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
diff --git a/lib/zstd/decompress/zstd_decompress.c b/lib/zstd/decompress/zs=
td_decompress.c
index 6b3177c94711..bb009554e3a6 100644
--- a/lib/zstd/decompress/zstd_decompress.c
+++ b/lib/zstd/decompress/zstd_decompress.c
@@ -1,5 +1,6 @@
+// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -53,13 +54,15 @@
 *  Dependencies
 *********************************************************/
 #include "../common/zstd_deps.h"   /* ZSTD_memcpy, ZSTD_memmove, ZSTD_mems=
et */
+#include "../common/allocations.h"  /* ZSTD_customMalloc, ZSTD_customCallo=
c, ZSTD_customFree */
+#include "../common/error_private.h"
+#include "../common/zstd_internal.h"  /* blockProperties_t */
 #include "../common/mem.h"         /* low level memory routines */
+#include "../common/bits.h"  /* ZSTD_highbit32 */
 #define FSE_STATIC_LINKING_ONLY
 #include "../common/fse.h"
-#define HUF_STATIC_LINKING_ONLY
 #include "../common/huf.h"
 #include <linux/xxhash.h> /* xxh64_reset, xxh64_update, xxh64_digest, XXH6=
4 */
-#include "../common/zstd_internal.h"  /* blockProperties_t */
 #include "zstd_decompress_internal.h"   /* ZSTD_DCtx */
 #include "zstd_ddict.h"  /* ZSTD_DDictDictContent */
 #include "zstd_decompress_block.h"   /* ZSTD_decompressBlock_internal */
@@ -72,11 +75,11 @@
  *************************************/
=20
 #define DDICT_HASHSET_MAX_LOAD_FACTOR_COUNT_MULT 4
-#define DDICT_HASHSET_MAX_LOAD_FACTOR_SIZE_MULT 3   /* These two constants=
 represent SIZE_MULT/COUNT_MULT load factor without using a float.
-                                                     * Currently, that mea=
ns a 0.75 load factor.
-                                                     * So, if count * COUN=
T_MULT / size * SIZE_MULT !=3D 0, then we've exceeded
-                                                     * the load factor of =
the ddict hash set.
-                                                     */
+#define DDICT_HASHSET_MAX_LOAD_FACTOR_SIZE_MULT 3  /* These two constants =
represent SIZE_MULT/COUNT_MULT load factor without using a float.
+                                                    * Currently, that mean=
s a 0.75 load factor.
+                                                    * So, if count * COUNT=
_MULT / size * SIZE_MULT !=3D 0, then we've exceeded
+                                                    * the load factor of t=
he ddict hash set.
+                                                    */
=20
 #define DDICT_HASHSET_TABLE_BASE_SIZE 64
 #define DDICT_HASHSET_RESIZE_FACTOR 2
@@ -237,6 +240,8 @@ static void ZSTD_DCtx_resetParameters(ZSTD_DCtx* dctx)
     dctx->outBufferMode =3D ZSTD_bm_buffered;
     dctx->forceIgnoreChecksum =3D ZSTD_d_validateChecksum;
     dctx->refMultipleDDicts =3D ZSTD_rmd_refSingleDDict;
+    dctx->disableHufAsm =3D 0;
+    dctx->maxBlockSizeParam =3D 0;
 }
=20
 static void ZSTD_initDCtx_internal(ZSTD_DCtx* dctx)
@@ -253,6 +258,7 @@ static void ZSTD_initDCtx_internal(ZSTD_DCtx* dctx)
     dctx->streamStage =3D zdss_init;
     dctx->noForwardProgress =3D 0;
     dctx->oversizedDuration =3D 0;
+    dctx->isFrameDecompression =3D 1;
 #if DYNAMIC_BMI2
     dctx->bmi2 =3D ZSTD_cpuSupportsBmi2();
 #endif
@@ -421,16 +427,40 @@ size_t ZSTD_frameHeaderSize(const void* src, size_t s=
rcSize)
  *  note : only works for formats ZSTD_f_zstd1 and ZSTD_f_zstd1_magicless
  * @return : 0, `zfhPtr` is correctly filled,
  *          >0, `srcSize` is too small, value is wanted `srcSize` amount,
- *           or an error code, which can be tested using ZSTD_isError() */
-size_t ZSTD_getFrameHeader_advanced(ZSTD_frameHeader* zfhPtr, const void* =
src, size_t srcSize, ZSTD_format_e format)
+**           or an error code, which can be tested using ZSTD_isError() */
+size_t ZSTD_getFrameHeader_advanced(ZSTD_FrameHeader* zfhPtr, const void* =
src, size_t srcSize, ZSTD_format_e format)
 {
     const BYTE* ip =3D (const BYTE*)src;
     size_t const minInputSize =3D ZSTD_startingInputLength(format);
=20
-    ZSTD_memset(zfhPtr, 0, sizeof(*zfhPtr));   /* not strictly necessary, =
but static analyzer do not understand that zfhPtr is only going to be read =
only if return value is zero, since they are 2 different signals */
-    if (srcSize < minInputSize) return minInputSize;
-    RETURN_ERROR_IF(src=3D=3DNULL, GENERIC, "invalid parameter");
+    DEBUGLOG(5, "ZSTD_getFrameHeader_advanced: minInputSize =3D %zu, srcSi=
ze =3D %zu", minInputSize, srcSize);
+
+    if (srcSize > 0) {
+        /* note : technically could be considered an assert(), since it's =
an invalid entry */
+        RETURN_ERROR_IF(src=3D=3DNULL, GENERIC, "invalid parameter : src=
=3D=3DNULL, but srcSize>0");
+    }
+    if (srcSize < minInputSize) {
+        if (srcSize > 0 && format !=3D ZSTD_f_zstd1_magicless) {
+            /* when receiving less than @minInputSize bytes,
+             * control these bytes at least correspond to a supported magi=
c number
+             * in order to error out early if they don't.
+            **/
+            size_t const toCopy =3D MIN(4, srcSize);
+            unsigned char hbuf[4]; MEM_writeLE32(hbuf, ZSTD_MAGICNUMBER);
+            assert(src !=3D NULL);
+            ZSTD_memcpy(hbuf, src, toCopy);
+            if ( MEM_readLE32(hbuf) !=3D ZSTD_MAGICNUMBER ) {
+                /* not a zstd frame : let's check if it's a skippable fram=
e */
+                MEM_writeLE32(hbuf, ZSTD_MAGIC_SKIPPABLE_START);
+                ZSTD_memcpy(hbuf, src, toCopy);
+                if ((MEM_readLE32(hbuf) & ZSTD_MAGIC_SKIPPABLE_MASK) !=3D =
ZSTD_MAGIC_SKIPPABLE_START) {
+                    RETURN_ERROR(prefix_unknown,
+                                "first bytes don't correspond to any suppo=
rted magic number");
+        }   }   }
+        return minInputSize;
+    }
=20
+    ZSTD_memset(zfhPtr, 0, sizeof(*zfhPtr));   /* not strictly necessary, =
but static analyzers may not understand that zfhPtr will be read only if re=
turn value is zero, since they are 2 different signals */
     if ( (format !=3D ZSTD_f_zstd1_magicless)
       && (MEM_readLE32(src) !=3D ZSTD_MAGICNUMBER) ) {
         if ((MEM_readLE32(src) & ZSTD_MAGIC_SKIPPABLE_MASK) =3D=3D ZSTD_MA=
GIC_SKIPPABLE_START) {
@@ -438,8 +468,10 @@ size_t ZSTD_getFrameHeader_advanced(ZSTD_frameHeader* =
zfhPtr, const void* src, s
             if (srcSize < ZSTD_SKIPPABLEHEADERSIZE)
                 return ZSTD_SKIPPABLEHEADERSIZE; /* magic number + frame l=
ength */
             ZSTD_memset(zfhPtr, 0, sizeof(*zfhPtr));
-            zfhPtr->frameContentSize =3D MEM_readLE32((const char *)src + =
ZSTD_FRAMEIDSIZE);
             zfhPtr->frameType =3D ZSTD_skippableFrame;
+            zfhPtr->dictID =3D MEM_readLE32(src) - ZSTD_MAGIC_SKIPPABLE_ST=
ART;
+            zfhPtr->headerSize =3D ZSTD_SKIPPABLEHEADERSIZE;
+            zfhPtr->frameContentSize =3D MEM_readLE32((const char *)src + =
ZSTD_FRAMEIDSIZE);
             return 0;
         }
         RETURN_ERROR(prefix_unknown, "");
@@ -508,7 +540,7 @@ size_t ZSTD_getFrameHeader_advanced(ZSTD_frameHeader* z=
fhPtr, const void* src, s
  * @return : 0, `zfhPtr` is correctly filled,
  *          >0, `srcSize` is too small, value is wanted `srcSize` amount,
  *           or an error code, which can be tested using ZSTD_isError() */
-size_t ZSTD_getFrameHeader(ZSTD_frameHeader* zfhPtr, const void* src, size=
_t srcSize)
+size_t ZSTD_getFrameHeader(ZSTD_FrameHeader* zfhPtr, const void* src, size=
_t srcSize)
 {
     return ZSTD_getFrameHeader_advanced(zfhPtr, src, srcSize, ZSTD_f_zstd1=
);
 }
@@ -520,7 +552,7 @@ size_t ZSTD_getFrameHeader(ZSTD_frameHeader* zfhPtr, co=
nst void* src, size_t src
  *         - ZSTD_CONTENTSIZE_ERROR if an error occurred (e.g. invalid mag=
ic number, srcSize too small) */
 unsigned long long ZSTD_getFrameContentSize(const void *src, size_t srcSiz=
e)
 {
-    {   ZSTD_frameHeader zfh;
+    {   ZSTD_FrameHeader zfh;
         if (ZSTD_getFrameHeader(&zfh, src, srcSize) !=3D 0)
             return ZSTD_CONTENTSIZE_ERROR;
         if (zfh.frameType =3D=3D ZSTD_skippableFrame) {
@@ -540,49 +572,52 @@ static size_t readSkippableFrameSize(void const* src,=
 size_t srcSize)
     sizeU32 =3D MEM_readLE32((BYTE const*)src + ZSTD_FRAMEIDSIZE);
     RETURN_ERROR_IF((U32)(sizeU32 + ZSTD_SKIPPABLEHEADERSIZE) < sizeU32,
                     frameParameter_unsupported, "");
-    {
-        size_t const skippableSize =3D skippableHeaderSize + sizeU32;
+    {   size_t const skippableSize =3D skippableHeaderSize + sizeU32;
         RETURN_ERROR_IF(skippableSize > srcSize, srcSize_wrong, "");
         return skippableSize;
     }
 }
=20
 /*! ZSTD_readSkippableFrame() :
- * Retrieves a zstd skippable frame containing data given by src, and writ=
es it to dst buffer.
+ * Retrieves content of a skippable frame, and writes it to dst buffer.
  *
  * The parameter magicVariant will receive the magicVariant that was suppl=
ied when the frame was written,
  * i.e. magicNumber - ZSTD_MAGIC_SKIPPABLE_START.  This can be NULL if the=
 caller is not interested
  * in the magicVariant.
  *
- * Returns an error if destination buffer is not large enough, or if the f=
rame is not skippable.
+ * Returns an error if destination buffer is not large enough, or if this =
is not a valid skippable frame.
  *
  * @return : number of bytes written or a ZSTD error.
  */
-ZSTDLIB_API size_t ZSTD_readSkippableFrame(void* dst, size_t dstCapacity, =
unsigned* magicVariant,
-                                            const void* src, size_t srcSiz=
e)
+size_t ZSTD_readSkippableFrame(void* dst, size_t dstCapacity,
+                               unsigned* magicVariant,  /* optional, can b=
e NULL */
+                         const void* src, size_t srcSize)
 {
-    U32 const magicNumber =3D MEM_readLE32(src);
-    size_t skippableFrameSize =3D readSkippableFrameSize(src, srcSize);
-    size_t skippableContentSize =3D skippableFrameSize - ZSTD_SKIPPABLEHEA=
DERSIZE;
-
-    /* check input validity */
-    RETURN_ERROR_IF(!ZSTD_isSkippableFrame(src, srcSize), frameParameter_u=
nsupported, "");
-    RETURN_ERROR_IF(skippableFrameSize < ZSTD_SKIPPABLEHEADERSIZE || skipp=
ableFrameSize > srcSize, srcSize_wrong, "");
-    RETURN_ERROR_IF(skippableContentSize > dstCapacity, dstSize_tooSmall, =
"");
+    RETURN_ERROR_IF(srcSize < ZSTD_SKIPPABLEHEADERSIZE, srcSize_wrong, "");
=20
-    /* deliver payload */
-    if (skippableContentSize > 0  && dst !=3D NULL)
-        ZSTD_memcpy(dst, (const BYTE *)src + ZSTD_SKIPPABLEHEADERSIZE, ski=
ppableContentSize);
-    if (magicVariant !=3D NULL)
-        *magicVariant =3D magicNumber - ZSTD_MAGIC_SKIPPABLE_START;
-    return skippableContentSize;
+    {   U32 const magicNumber =3D MEM_readLE32(src);
+        size_t skippableFrameSize =3D readSkippableFrameSize(src, srcSize);
+        size_t skippableContentSize =3D skippableFrameSize - ZSTD_SKIPPABL=
EHEADERSIZE;
+
+        /* check input validity */
+        RETURN_ERROR_IF(!ZSTD_isSkippableFrame(src, srcSize), frameParamet=
er_unsupported, "");
+        RETURN_ERROR_IF(skippableFrameSize < ZSTD_SKIPPABLEHEADERSIZE || s=
kippableFrameSize > srcSize, srcSize_wrong, "");
+        RETURN_ERROR_IF(skippableContentSize > dstCapacity, dstSize_tooSma=
ll, "");
+
+        /* deliver payload */
+        if (skippableContentSize > 0  && dst !=3D NULL)
+            ZSTD_memcpy(dst, (const BYTE *)src + ZSTD_SKIPPABLEHEADERSIZE,=
 skippableContentSize);
+        if (magicVariant !=3D NULL)
+            *magicVariant =3D magicNumber - ZSTD_MAGIC_SKIPPABLE_START;
+        return skippableContentSize;
+    }
 }
=20
 /* ZSTD_findDecompressedSize() :
- *  compatible with legacy mode
  *  `srcSize` must be the exact length of some number of ZSTD compressed a=
nd/or
  *      skippable frames
- *  @return : decompressed size of the frames contained */
+ *  note: compatible with legacy mode
+ * @return : decompressed size of the frames contained */
 unsigned long long ZSTD_findDecompressedSize(const void* src, size_t srcSi=
ze)
 {
     unsigned long long totalDstSize =3D 0;
@@ -592,9 +627,7 @@ unsigned long long ZSTD_findDecompressedSize(const void=
* src, size_t srcSize)
=20
         if ((magicNumber & ZSTD_MAGIC_SKIPPABLE_MASK) =3D=3D ZSTD_MAGIC_SK=
IPPABLE_START) {
             size_t const skippableSize =3D readSkippableFrameSize(src, src=
Size);
-            if (ZSTD_isError(skippableSize)) {
-                return ZSTD_CONTENTSIZE_ERROR;
-            }
+            if (ZSTD_isError(skippableSize)) return ZSTD_CONTENTSIZE_ERROR;
             assert(skippableSize <=3D srcSize);
=20
             src =3D (const BYTE *)src + skippableSize;
@@ -602,17 +635,17 @@ unsigned long long ZSTD_findDecompressedSize(const vo=
id* src, size_t srcSize)
             continue;
         }
=20
-        {   unsigned long long const ret =3D ZSTD_getFrameContentSize(src,=
 srcSize);
-            if (ret >=3D ZSTD_CONTENTSIZE_ERROR) return ret;
+        {   unsigned long long const fcs =3D ZSTD_getFrameContentSize(src,=
 srcSize);
+            if (fcs >=3D ZSTD_CONTENTSIZE_ERROR) return fcs;
=20
-            /* check for overflow */
-            if (totalDstSize + ret < totalDstSize) return ZSTD_CONTENTSIZE=
_ERROR;
-            totalDstSize +=3D ret;
+            if (totalDstSize + fcs < totalDstSize)
+                return ZSTD_CONTENTSIZE_ERROR; /* check for overflow */
+            totalDstSize +=3D fcs;
         }
+        /* skip to next frame */
         {   size_t const frameSrcSize =3D ZSTD_findFrameCompressedSize(src=
, srcSize);
-            if (ZSTD_isError(frameSrcSize)) {
-                return ZSTD_CONTENTSIZE_ERROR;
-            }
+            if (ZSTD_isError(frameSrcSize)) return ZSTD_CONTENTSIZE_ERROR;
+            assert(frameSrcSize <=3D srcSize);
=20
             src =3D (const BYTE *)src + frameSrcSize;
             srcSize -=3D frameSrcSize;
@@ -676,13 +709,13 @@ static ZSTD_frameSizeInfo ZSTD_errorFrameSizeInfo(siz=
e_t ret)
     return frameSizeInfo;
 }
=20
-static ZSTD_frameSizeInfo ZSTD_findFrameSizeInfo(const void* src, size_t s=
rcSize)
+static ZSTD_frameSizeInfo ZSTD_findFrameSizeInfo(const void* src, size_t s=
rcSize, ZSTD_format_e format)
 {
     ZSTD_frameSizeInfo frameSizeInfo;
     ZSTD_memset(&frameSizeInfo, 0, sizeof(ZSTD_frameSizeInfo));
=20
=20
-    if ((srcSize >=3D ZSTD_SKIPPABLEHEADERSIZE)
+    if (format =3D=3D ZSTD_f_zstd1 && (srcSize >=3D ZSTD_SKIPPABLEHEADERSI=
ZE)
         && (MEM_readLE32(src) & ZSTD_MAGIC_SKIPPABLE_MASK) =3D=3D ZSTD_MAG=
IC_SKIPPABLE_START) {
         frameSizeInfo.compressedSize =3D readSkippableFrameSize(src, srcSi=
ze);
         assert(ZSTD_isError(frameSizeInfo.compressedSize) ||
@@ -693,10 +726,10 @@ static ZSTD_frameSizeInfo ZSTD_findFrameSizeInfo(cons=
t void* src, size_t srcSize
         const BYTE* const ipstart =3D ip;
         size_t remainingSize =3D srcSize;
         size_t nbBlocks =3D 0;
-        ZSTD_frameHeader zfh;
+        ZSTD_FrameHeader zfh;
=20
         /* Extract Frame Header */
-        {   size_t const ret =3D ZSTD_getFrameHeader(&zfh, src, srcSize);
+        {   size_t const ret =3D ZSTD_getFrameHeader_advanced(&zfh, src, s=
rcSize, format);
             if (ZSTD_isError(ret))
                 return ZSTD_errorFrameSizeInfo(ret);
             if (ret > 0)
@@ -730,28 +763,31 @@ static ZSTD_frameSizeInfo ZSTD_findFrameSizeInfo(cons=
t void* src, size_t srcSize
             ip +=3D 4;
         }
=20
+        frameSizeInfo.nbBlocks =3D nbBlocks;
         frameSizeInfo.compressedSize =3D (size_t)(ip - ipstart);
         frameSizeInfo.decompressedBound =3D (zfh.frameContentSize !=3D ZST=
D_CONTENTSIZE_UNKNOWN)
                                         ? zfh.frameContentSize
-                                        : nbBlocks * zfh.blockSizeMax;
+                                        : (unsigned long long)nbBlocks * z=
fh.blockSizeMax;
         return frameSizeInfo;
     }
 }
=20
+static size_t ZSTD_findFrameCompressedSize_advanced(const void *src, size_=
t srcSize, ZSTD_format_e format) {
+    ZSTD_frameSizeInfo const frameSizeInfo =3D ZSTD_findFrameSizeInfo(src,=
 srcSize, format);
+    return frameSizeInfo.compressedSize;
+}
+
 /* ZSTD_findFrameCompressedSize() :
- *  compatible with legacy mode
- *  `src` must point to the start of a ZSTD frame, ZSTD legacy frame, or s=
kippable frame
- *  `srcSize` must be at least as large as the frame contained
- *  @return : the compressed size of the frame starting at `src` */
+ * See docs in zstd.h
+ * Note: compatible with legacy mode */
 size_t ZSTD_findFrameCompressedSize(const void *src, size_t srcSize)
 {
-    ZSTD_frameSizeInfo const frameSizeInfo =3D ZSTD_findFrameSizeInfo(src,=
 srcSize);
-    return frameSizeInfo.compressedSize;
+    return ZSTD_findFrameCompressedSize_advanced(src, srcSize, ZSTD_f_zstd=
1);
 }
=20
 /* ZSTD_decompressBound() :
  *  compatible with legacy mode
- *  `src` must point to the start of a ZSTD frame or a skippeable frame
+ *  `src` must point to the start of a ZSTD frame or a skippable frame
  *  `srcSize` must be at least as large as the frame contained
  *  @return : the maximum decompressed size of the compressed source
  */
@@ -760,7 +796,7 @@ unsigned long long ZSTD_decompressBound(const void* src=
, size_t srcSize)
     unsigned long long bound =3D 0;
     /* Iterate over each frame */
     while (srcSize > 0) {
-        ZSTD_frameSizeInfo const frameSizeInfo =3D ZSTD_findFrameSizeInfo(=
src, srcSize);
+        ZSTD_frameSizeInfo const frameSizeInfo =3D ZSTD_findFrameSizeInfo(=
src, srcSize, ZSTD_f_zstd1);
         size_t const compressedSize =3D frameSizeInfo.compressedSize;
         unsigned long long const decompressedBound =3D frameSizeInfo.decom=
pressedBound;
         if (ZSTD_isError(compressedSize) || decompressedBound =3D=3D ZSTD_=
CONTENTSIZE_ERROR)
@@ -773,6 +809,48 @@ unsigned long long ZSTD_decompressBound(const void* sr=
c, size_t srcSize)
     return bound;
 }
=20
+size_t ZSTD_decompressionMargin(void const* src, size_t srcSize)
+{
+    size_t margin =3D 0;
+    unsigned maxBlockSize =3D 0;
+
+    /* Iterate over each frame */
+    while (srcSize > 0) {
+        ZSTD_frameSizeInfo const frameSizeInfo =3D ZSTD_findFrameSizeInfo(=
src, srcSize, ZSTD_f_zstd1);
+        size_t const compressedSize =3D frameSizeInfo.compressedSize;
+        unsigned long long const decompressedBound =3D frameSizeInfo.decom=
pressedBound;
+        ZSTD_FrameHeader zfh;
+
+        FORWARD_IF_ERROR(ZSTD_getFrameHeader(&zfh, src, srcSize), "");
+        if (ZSTD_isError(compressedSize) || decompressedBound =3D=3D ZSTD_=
CONTENTSIZE_ERROR)
+            return ERROR(corruption_detected);
+
+        if (zfh.frameType =3D=3D ZSTD_frame) {
+            /* Add the frame header to our margin */
+            margin +=3D zfh.headerSize;
+            /* Add the checksum to our margin */
+            margin +=3D zfh.checksumFlag ? 4 : 0;
+            /* Add 3 bytes per block */
+            margin +=3D 3 * frameSizeInfo.nbBlocks;
+
+            /* Compute the max block size */
+            maxBlockSize =3D MAX(maxBlockSize, zfh.blockSizeMax);
+        } else {
+            assert(zfh.frameType =3D=3D ZSTD_skippableFrame);
+            /* Add the entire skippable frame size to our margin. */
+            margin +=3D compressedSize;
+        }
+
+        assert(srcSize >=3D compressedSize);
+        src =3D (const BYTE*)src + compressedSize;
+        srcSize -=3D compressedSize;
+    }
+
+    /* Add the max block size back to the margin. */
+    margin +=3D maxBlockSize;
+
+    return margin;
+}
=20
 /*-*************************************************************
  *   Frame decoding
@@ -815,7 +893,7 @@ static size_t ZSTD_setRleBlock(void* dst, size_t dstCap=
acity,
     return regenSize;
 }
=20
-static void ZSTD_DCtx_trace_end(ZSTD_DCtx const* dctx, U64 uncompressedSiz=
e, U64 compressedSize, unsigned streaming)
+static void ZSTD_DCtx_trace_end(ZSTD_DCtx const* dctx, U64 uncompressedSiz=
e, U64 compressedSize, int streaming)
 {
     (void)dctx;
     (void)uncompressedSize;
@@ -856,6 +934,10 @@ static size_t ZSTD_decompressFrame(ZSTD_DCtx* dctx,
         ip +=3D frameHeaderSize; remainingSrcSize -=3D frameHeaderSize;
     }
=20
+    /* Shrink the blockSizeMax if enabled */
+    if (dctx->maxBlockSizeParam !=3D 0)
+        dctx->fParams.blockSizeMax =3D MIN(dctx->fParams.blockSizeMax, (un=
signed)dctx->maxBlockSizeParam);
+
     /* Loop on each block */
     while (1) {
         BYTE* oBlockEnd =3D oend;
@@ -888,7 +970,8 @@ static size_t ZSTD_decompressFrame(ZSTD_DCtx* dctx,
         switch(blockProperties.blockType)
         {
         case bt_compressed:
-            decodedSize =3D ZSTD_decompressBlock_internal(dctx, op, (size_=
t)(oBlockEnd-op), ip, cBlockSize, /* frame */ 1, not_streaming);
+            assert(dctx->isFrameDecompression =3D=3D 1);
+            decodedSize =3D ZSTD_decompressBlock_internal(dctx, op, (size_=
t)(oBlockEnd-op), ip, cBlockSize, not_streaming);
             break;
         case bt_raw :
             /* Use oend instead of oBlockEnd because this function is safe=
 to overlap. It uses memmove. */
@@ -901,12 +984,14 @@ static size_t ZSTD_decompressFrame(ZSTD_DCtx* dctx,
         default:
             RETURN_ERROR(corruption_detected, "invalid block type");
         }
-
-        if (ZSTD_isError(decodedSize)) return decodedSize;
-        if (dctx->validateChecksum)
+        FORWARD_IF_ERROR(decodedSize, "Block decompression failure");
+        DEBUGLOG(5, "Decompressed block of dSize =3D %u", (unsigned)decode=
dSize);
+        if (dctx->validateChecksum) {
             xxh64_update(&dctx->xxhState, op, decodedSize);
-        if (decodedSize !=3D 0)
+        }
+        if (decodedSize) /* support dst =3D NULL,0 */ {
             op +=3D decodedSize;
+        }
         assert(ip !=3D NULL);
         ip +=3D cBlockSize;
         remainingSrcSize -=3D cBlockSize;
@@ -930,12 +1015,15 @@ static size_t ZSTD_decompressFrame(ZSTD_DCtx* dctx,
     }
     ZSTD_DCtx_trace_end(dctx, (U64)(op-ostart), (U64)(ip-istart), /* strea=
ming */ 0);
     /* Allow caller to get size read */
+    DEBUGLOG(4, "ZSTD_decompressFrame: decompressed frame of size %i, cons=
uming %i bytes of input", (int)(op-ostart), (int)(ip - (const BYTE*)*srcPtr=
));
     *srcPtr =3D ip;
     *srcSizePtr =3D remainingSrcSize;
     return (size_t)(op-ostart);
 }
=20
-static size_t ZSTD_decompressMultiFrame(ZSTD_DCtx* dctx,
+static
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
+size_t ZSTD_decompressMultiFrame(ZSTD_DCtx* dctx,
                                         void* dst, size_t dstCapacity,
                                   const void* src, size_t srcSize,
                                   const void* dict, size_t dictSize,
@@ -955,17 +1043,18 @@ static size_t ZSTD_decompressMultiFrame(ZSTD_DCtx* d=
ctx,
     while (srcSize >=3D ZSTD_startingInputLength(dctx->format)) {
=20
=20
-        {   U32 const magicNumber =3D MEM_readLE32(src);
-            DEBUGLOG(4, "reading magic number %08X (expecting %08X)",
-                        (unsigned)magicNumber, ZSTD_MAGICNUMBER);
+        if (dctx->format =3D=3D ZSTD_f_zstd1 && srcSize >=3D 4) {
+            U32 const magicNumber =3D MEM_readLE32(src);
+            DEBUGLOG(5, "reading magic number %08X", (unsigned)magicNumber=
);
             if ((magicNumber & ZSTD_MAGIC_SKIPPABLE_MASK) =3D=3D ZSTD_MAGI=
C_SKIPPABLE_START) {
+                /* skippable frame detected : skip it */
                 size_t const skippableSize =3D readSkippableFrameSize(src,=
 srcSize);
-                FORWARD_IF_ERROR(skippableSize, "readSkippableFrameSize fa=
iled");
+                FORWARD_IF_ERROR(skippableSize, "invalid skippable frame");
                 assert(skippableSize <=3D srcSize);
=20
                 src =3D (const BYTE *)src + skippableSize;
                 srcSize -=3D skippableSize;
-                continue;
+                continue; /* check next frame */
         }   }
=20
         if (ddict) {
@@ -1061,8 +1150,8 @@ size_t ZSTD_decompress(void* dst, size_t dstCapacity,=
 const void* src, size_t sr
 size_t ZSTD_nextSrcSizeToDecompress(ZSTD_DCtx* dctx) { return dctx->expect=
ed; }
=20
 /*
- * Similar to ZSTD_nextSrcSizeToDecompress(), but when a block input can b=
e streamed,
- * we allow taking a partial block as the input. Currently only raw uncomp=
ressed blocks can
+ * Similar to ZSTD_nextSrcSizeToDecompress(), but when a block input can b=
e streamed, we
+ * allow taking a partial block as the input. Currently only raw uncompres=
sed blocks can
  * be streamed.
  *
  * For blocks that can be streamed, this allows us to reduce the latency u=
ntil we produce
@@ -1181,7 +1270,8 @@ size_t ZSTD_decompressContinue(ZSTD_DCtx* dctx, void*=
 dst, size_t dstCapacity, c
             {
             case bt_compressed:
                 DEBUGLOG(5, "ZSTD_decompressContinue: case bt_compressed");
-                rSize =3D ZSTD_decompressBlock_internal(dctx, dst, dstCapa=
city, src, srcSize, /* frame */ 1, is_streaming);
+                assert(dctx->isFrameDecompression =3D=3D 1);
+                rSize =3D ZSTD_decompressBlock_internal(dctx, dst, dstCapa=
city, src, srcSize, is_streaming);
                 dctx->expected =3D 0;  /* Streaming not supported */
                 break;
             case bt_raw :
@@ -1250,6 +1340,7 @@ size_t ZSTD_decompressContinue(ZSTD_DCtx* dctx, void*=
 dst, size_t dstCapacity, c
     case ZSTDds_decodeSkippableHeader:
         assert(src !=3D NULL);
         assert(srcSize <=3D ZSTD_SKIPPABLEHEADERSIZE);
+        assert(dctx->format !=3D ZSTD_f_zstd1_magicless);
         ZSTD_memcpy(dctx->headerBuffer + (ZSTD_SKIPPABLEHEADERSIZE - srcSi=
ze), src, srcSize);   /* complete skippable header */
         dctx->expected =3D MEM_readLE32(dctx->headerBuffer + ZSTD_FRAMEIDS=
IZE);   /* note : dctx->expected can grow seriously large, beyond local buf=
fer size */
         dctx->stage =3D ZSTDds_skipFrame;
@@ -1262,7 +1353,7 @@ size_t ZSTD_decompressContinue(ZSTD_DCtx* dctx, void*=
 dst, size_t dstCapacity, c
=20
     default:
         assert(0);   /* impossible */
-        RETURN_ERROR(GENERIC, "impossible to reach");   /* some compiler r=
equire default to do something */
+        RETURN_ERROR(GENERIC, "impossible to reach");   /* some compilers =
require default to do something */
     }
 }
=20
@@ -1303,11 +1394,11 @@ ZSTD_loadDEntropy(ZSTD_entropyDTables_t* entropy,
         /* in minimal huffman, we always use X1 variants */
         size_t const hSize =3D HUF_readDTableX1_wksp(entropy->hufTable,
                                                 dictPtr, dictEnd - dictPtr,
-                                                workspace, workspaceSize);
+                                                workspace, workspaceSize, =
/* flags */ 0);
 #else
         size_t const hSize =3D HUF_readDTableX2_wksp(entropy->hufTable,
                                                 dictPtr, (size_t)(dictEnd =
- dictPtr),
-                                                workspace, workspaceSize);
+                                                workspace, workspaceSize, =
/* flags */ 0);
 #endif
         RETURN_ERROR_IF(HUF_isError(hSize), dictionary_corrupted, "");
         dictPtr +=3D hSize;
@@ -1403,10 +1494,11 @@ size_t ZSTD_decompressBegin(ZSTD_DCtx* dctx)
     dctx->prefixStart =3D NULL;
     dctx->virtualStart =3D NULL;
     dctx->dictEnd =3D NULL;
-    dctx->entropy.hufTable[0] =3D (HUF_DTable)((HufLog)*0x1000001);  /* co=
ver both little and big endian */
+    dctx->entropy.hufTable[0] =3D (HUF_DTable)((ZSTD_HUFFDTABLE_CAPACITY_L=
OG)*0x1000001);  /* cover both little and big endian */
     dctx->litEntropy =3D dctx->fseEntropy =3D 0;
     dctx->dictID =3D 0;
     dctx->bType =3D bt_reserved;
+    dctx->isFrameDecompression =3D 1;
     ZSTD_STATIC_ASSERT(sizeof(dctx->entropy.rep) =3D=3D sizeof(repStartVal=
ue));
     ZSTD_memcpy(dctx->entropy.rep, repStartValue, sizeof(repStartValue)); =
 /* initial repcodes */
     dctx->LLTptr =3D dctx->entropy.LLTable;
@@ -1465,7 +1557,7 @@ unsigned ZSTD_getDictID_fromDict(const void* dict, si=
ze_t dictSize)
  *  This could for one of the following reasons :
  *  - The frame does not require a dictionary (most common case).
  *  - The frame was built with dictID intentionally removed.
- *    Needed dictionary is a hidden information.
+ *    Needed dictionary is a hidden piece of information.
  *    Note : this use case also happens when using a non-conformant dictio=
nary.
  *  - `srcSize` is too small, and as a result, frame header could not be d=
ecoded.
  *    Note : possible if `srcSize < ZSTD_FRAMEHEADERSIZE_MAX`.
@@ -1474,7 +1566,7 @@ unsigned ZSTD_getDictID_fromDict(const void* dict, si=
ze_t dictSize)
  *  ZSTD_getFrameHeader(), which will provide a more precise error code. */
 unsigned ZSTD_getDictID_fromFrame(const void* src, size_t srcSize)
 {
-    ZSTD_frameHeader zfp =3D { 0, 0, 0, ZSTD_frame, 0, 0, 0 };
+    ZSTD_FrameHeader zfp =3D { 0, 0, 0, ZSTD_frame, 0, 0, 0, 0, 0 };
     size_t const hError =3D ZSTD_getFrameHeader(&zfp, src, srcSize);
     if (ZSTD_isError(hError)) return 0;
     return zfp.dictID;
@@ -1581,7 +1673,9 @@ size_t ZSTD_initDStream_usingDict(ZSTD_DStream* zds, =
const void* dict, size_t di
 size_t ZSTD_initDStream(ZSTD_DStream* zds)
 {
     DEBUGLOG(4, "ZSTD_initDStream");
-    return ZSTD_initDStream_usingDDict(zds, NULL);
+    FORWARD_IF_ERROR(ZSTD_DCtx_reset(zds, ZSTD_reset_session_only), "");
+    FORWARD_IF_ERROR(ZSTD_DCtx_refDDict(zds, NULL), "");
+    return ZSTD_startingInputLength(zds->format);
 }
=20
 /* ZSTD_initDStream_usingDDict() :
@@ -1589,6 +1683,7 @@ size_t ZSTD_initDStream(ZSTD_DStream* zds)
  * this function cannot fail */
 size_t ZSTD_initDStream_usingDDict(ZSTD_DStream* dctx, const ZSTD_DDict* d=
dict)
 {
+    DEBUGLOG(4, "ZSTD_initDStream_usingDDict");
     FORWARD_IF_ERROR( ZSTD_DCtx_reset(dctx, ZSTD_reset_session_only) , "");
     FORWARD_IF_ERROR( ZSTD_DCtx_refDDict(dctx, ddict) , "");
     return ZSTD_startingInputLength(dctx->format);
@@ -1599,6 +1694,7 @@ size_t ZSTD_initDStream_usingDDict(ZSTD_DStream* dctx=
, const ZSTD_DDict* ddict)
  * this function cannot fail */
 size_t ZSTD_resetDStream(ZSTD_DStream* dctx)
 {
+    DEBUGLOG(4, "ZSTD_resetDStream");
     FORWARD_IF_ERROR(ZSTD_DCtx_reset(dctx, ZSTD_reset_session_only), "");
     return ZSTD_startingInputLength(dctx->format);
 }
@@ -1670,6 +1766,15 @@ ZSTD_bounds ZSTD_dParam_getBounds(ZSTD_dParameter dP=
aram)
             bounds.lowerBound =3D (int)ZSTD_rmd_refSingleDDict;
             bounds.upperBound =3D (int)ZSTD_rmd_refMultipleDDicts;
             return bounds;
+        case ZSTD_d_disableHuffmanAssembly:
+            bounds.lowerBound =3D 0;
+            bounds.upperBound =3D 1;
+            return bounds;
+        case ZSTD_d_maxBlockSize:
+            bounds.lowerBound =3D ZSTD_BLOCKSIZE_MAX_MIN;
+            bounds.upperBound =3D ZSTD_BLOCKSIZE_MAX;
+            return bounds;
+
         default:;
     }
     bounds.error =3D ERROR(parameter_unsupported);
@@ -1710,6 +1815,12 @@ size_t ZSTD_DCtx_getParameter(ZSTD_DCtx* dctx, ZSTD_=
dParameter param, int* value
         case ZSTD_d_refMultipleDDicts:
             *value =3D (int)dctx->refMultipleDDicts;
             return 0;
+        case ZSTD_d_disableHuffmanAssembly:
+            *value =3D (int)dctx->disableHufAsm;
+            return 0;
+        case ZSTD_d_maxBlockSize:
+            *value =3D dctx->maxBlockSizeParam;
+            return 0;
         default:;
     }
     RETURN_ERROR(parameter_unsupported, "");
@@ -1743,6 +1854,14 @@ size_t ZSTD_DCtx_setParameter(ZSTD_DCtx* dctx, ZSTD_=
dParameter dParam, int value
             }
             dctx->refMultipleDDicts =3D (ZSTD_refMultipleDDicts_e)value;
             return 0;
+        case ZSTD_d_disableHuffmanAssembly:
+            CHECK_DBOUNDS(ZSTD_d_disableHuffmanAssembly, value);
+            dctx->disableHufAsm =3D value !=3D 0;
+            return 0;
+        case ZSTD_d_maxBlockSize:
+            if (value !=3D 0) CHECK_DBOUNDS(ZSTD_d_maxBlockSize, value);
+            dctx->maxBlockSizeParam =3D value;
+            return 0;
         default:;
     }
     RETURN_ERROR(parameter_unsupported, "");
@@ -1754,6 +1873,7 @@ size_t ZSTD_DCtx_reset(ZSTD_DCtx* dctx, ZSTD_ResetDir=
ective reset)
       || (reset =3D=3D ZSTD_reset_session_and_parameters) ) {
         dctx->streamStage =3D zdss_init;
         dctx->noForwardProgress =3D 0;
+        dctx->isFrameDecompression =3D 1;
     }
     if ( (reset =3D=3D ZSTD_reset_parameters)
       || (reset =3D=3D ZSTD_reset_session_and_parameters) ) {
@@ -1770,11 +1890,17 @@ size_t ZSTD_sizeof_DStream(const ZSTD_DStream* dctx)
     return ZSTD_sizeof_DCtx(dctx);
 }
=20
-size_t ZSTD_decodingBufferSize_min(unsigned long long windowSize, unsigned=
 long long frameContentSize)
+static size_t ZSTD_decodingBufferSize_internal(unsigned long long windowSi=
ze, unsigned long long frameContentSize, size_t blockSizeMax)
 {
-    size_t const blockSize =3D (size_t) MIN(windowSize, ZSTD_BLOCKSIZE_MAX=
);
-    /* space is needed to store the litbuffer after the output of a given =
block without stomping the extDict of a previous run, as well as to cover b=
oth windows against wildcopy*/
-    unsigned long long const neededRBSize =3D windowSize + blockSize + ZST=
D_BLOCKSIZE_MAX + (WILDCOPY_OVERLENGTH * 2);
+    size_t const blockSize =3D MIN((size_t)MIN(windowSize, ZSTD_BLOCKSIZE_=
MAX), blockSizeMax);
+    /* We need blockSize + WILDCOPY_OVERLENGTH worth of buffer so that if =
a block
+     * ends at windowSize + WILDCOPY_OVERLENGTH + 1 bytes, we can start wr=
iting
+     * the block at the beginning of the output buffer, and maintain a ful=
l window.
+     *
+     * We need another blockSize worth of buffer so that we can store split
+     * literals at the end of the block without overwriting the extDict wi=
ndow.
+     */
+    unsigned long long const neededRBSize =3D windowSize + (blockSize * 2)=
 + (WILDCOPY_OVERLENGTH * 2);
     unsigned long long const neededSize =3D MIN(frameContentSize, neededRB=
Size);
     size_t const minRBSize =3D (size_t) neededSize;
     RETURN_ERROR_IF((unsigned long long)minRBSize !=3D neededSize,
@@ -1782,6 +1908,11 @@ size_t ZSTD_decodingBufferSize_min(unsigned long lon=
g windowSize, unsigned long
     return minRBSize;
 }
=20
+size_t ZSTD_decodingBufferSize_min(unsigned long long windowSize, unsigned=
 long long frameContentSize)
+{
+    return ZSTD_decodingBufferSize_internal(windowSize, frameContentSize, =
ZSTD_BLOCKSIZE_MAX);
+}
+
 size_t ZSTD_estimateDStreamSize(size_t windowSize)
 {
     size_t const blockSize =3D MIN(windowSize, ZSTD_BLOCKSIZE_MAX);
@@ -1793,7 +1924,7 @@ size_t ZSTD_estimateDStreamSize(size_t windowSize)
 size_t ZSTD_estimateDStreamSize_fromFrame(const void* src, size_t srcSize)
 {
     U32 const windowSizeMax =3D 1U << ZSTD_WINDOWLOG_MAX;   /* note : shou=
ld be user-selectable, but requires an additional parameter (or a dctx) */
-    ZSTD_frameHeader zfh;
+    ZSTD_FrameHeader zfh;
     size_t const err =3D ZSTD_getFrameHeader(&zfh, src, srcSize);
     if (ZSTD_isError(err)) return err;
     RETURN_ERROR_IF(err>0, srcSize_wrong, "");
@@ -1888,6 +2019,7 @@ size_t ZSTD_decompressStream(ZSTD_DStream* zds, ZSTD_=
outBuffer* output, ZSTD_inB
     U32 someMoreWork =3D 1;
=20
     DEBUGLOG(5, "ZSTD_decompressStream");
+    assert(zds !=3D NULL);
     RETURN_ERROR_IF(
         input->pos > input->size,
         srcSize_wrong,
@@ -1918,7 +2050,6 @@ size_t ZSTD_decompressStream(ZSTD_DStream* zds, ZSTD_=
outBuffer* output, ZSTD_inB
                 if (zds->refMultipleDDicts && zds->ddictSet) {
                     ZSTD_DCtx_selectFrameDDict(zds);
                 }
-                DEBUGLOG(5, "header size : %u", (U32)hSize);
                 if (ZSTD_isError(hSize)) {
                     return hSize;   /* error */
                 }
@@ -1932,6 +2063,11 @@ size_t ZSTD_decompressStream(ZSTD_DStream* zds, ZSTD=
_outBuffer* output, ZSTD_inB
                             zds->lhSize +=3D remainingInput;
                         }
                         input->pos =3D input->size;
+                        /* check first few bytes */
+                        FORWARD_IF_ERROR(
+                            ZSTD_getFrameHeader_advanced(&zds->fParams, zd=
s->headerBuffer, zds->lhSize, zds->format),
+                            "First few bytes detected incorrect" );
+                        /* return hint input size */
                         return (MAX((size_t)ZSTD_FRAMEHEADERSIZE_MIN(zds->=
format), hSize) - zds->lhSize) + ZSTD_blockHeaderSize;   /* remaining heade=
r bytes + next block header */
                     }
                     assert(ip !=3D NULL);
@@ -1943,14 +2079,15 @@ size_t ZSTD_decompressStream(ZSTD_DStream* zds, ZST=
D_outBuffer* output, ZSTD_inB
             if (zds->fParams.frameContentSize !=3D ZSTD_CONTENTSIZE_UNKNOWN
                 && zds->fParams.frameType !=3D ZSTD_skippableFrame
                 && (U64)(size_t)(oend-op) >=3D zds->fParams.frameContentSi=
ze) {
-                size_t const cSize =3D ZSTD_findFrameCompressedSize(istart=
, (size_t)(iend-istart));
+                size_t const cSize =3D ZSTD_findFrameCompressedSize_advanc=
ed(istart, (size_t)(iend-istart), zds->format);
                 if (cSize <=3D (size_t)(iend-istart)) {
                     /* shortcut : using single-pass mode */
                     size_t const decompressedSize =3D ZSTD_decompress_usin=
gDDict(zds, op, (size_t)(oend-op), istart, cSize, ZSTD_getDDict(zds));
                     if (ZSTD_isError(decompressedSize)) return decompresse=
dSize;
-                    DEBUGLOG(4, "shortcut to single-pass ZSTD_decompress_u=
singDDict()")
+                    DEBUGLOG(4, "shortcut to single-pass ZSTD_decompress_u=
singDDict()");
+                    assert(istart !=3D NULL);
                     ip =3D istart + cSize;
-                    op +=3D decompressedSize;
+                    op =3D op ? op + decompressedSize : op; /* can occur i=
f frameContentSize =3D 0 (empty frame) */
                     zds->expected =3D 0;
                     zds->streamStage =3D zdss_init;
                     someMoreWork =3D 0;
@@ -1969,7 +2106,8 @@ size_t ZSTD_decompressStream(ZSTD_DStream* zds, ZSTD_=
outBuffer* output, ZSTD_inB
             DEBUGLOG(4, "Consume header");
             FORWARD_IF_ERROR(ZSTD_decompressBegin_usingDDict(zds, ZSTD_get=
DDict(zds)), "");
=20
-            if ((MEM_readLE32(zds->headerBuffer) & ZSTD_MAGIC_SKIPPABLE_MA=
SK) =3D=3D ZSTD_MAGIC_SKIPPABLE_START) {  /* skippable frame */
+            if (zds->format =3D=3D ZSTD_f_zstd1
+                && (MEM_readLE32(zds->headerBuffer) & ZSTD_MAGIC_SKIPPABLE=
_MASK) =3D=3D ZSTD_MAGIC_SKIPPABLE_START) {  /* skippable frame */
                 zds->expected =3D MEM_readLE32(zds->headerBuffer + ZSTD_FR=
AMEIDSIZE);
                 zds->stage =3D ZSTDds_skipFrame;
             } else {
@@ -1985,11 +2123,13 @@ size_t ZSTD_decompressStream(ZSTD_DStream* zds, ZST=
D_outBuffer* output, ZSTD_inB
             zds->fParams.windowSize =3D MAX(zds->fParams.windowSize, 1U <<=
 ZSTD_WINDOWLOG_ABSOLUTEMIN);
             RETURN_ERROR_IF(zds->fParams.windowSize > zds->maxWindowSize,
                             frameParameter_windowTooLarge, "");
+            if (zds->maxBlockSizeParam !=3D 0)
+                zds->fParams.blockSizeMax =3D MIN(zds->fParams.blockSizeMa=
x, (unsigned)zds->maxBlockSizeParam);
=20
             /* Adapt buffer sizes to frame header instructions */
             {   size_t const neededInBuffSize =3D MAX(zds->fParams.blockSi=
zeMax, 4 /* frame checksum */);
                 size_t const neededOutBuffSize =3D zds->outBufferMode =3D=
=3D ZSTD_bm_buffered
-                        ? ZSTD_decodingBufferSize_min(zds->fParams.windowS=
ize, zds->fParams.frameContentSize)
+                        ? ZSTD_decodingBufferSize_internal(zds->fParams.wi=
ndowSize, zds->fParams.frameContentSize, zds->fParams.blockSizeMax)
                         : 0;
=20
                 ZSTD_DCtx_updateOversizedDuration(zds, neededInBuffSize, n=
eededOutBuffSize);
@@ -2034,6 +2174,7 @@ size_t ZSTD_decompressStream(ZSTD_DStream* zds, ZSTD_=
outBuffer* output, ZSTD_inB
                 }
                 if ((size_t)(iend-ip) >=3D neededInSize) {  /* decode dire=
ctly from src */
                     FORWARD_IF_ERROR(ZSTD_decompressContinueStream(zds, &o=
p, oend, ip, neededInSize), "");
+                    assert(ip !=3D NULL);
                     ip +=3D neededInSize;
                     /* Function modifies the stage so we must break */
                     break;
@@ -2048,7 +2189,7 @@ size_t ZSTD_decompressStream(ZSTD_DStream* zds, ZSTD_=
outBuffer* output, ZSTD_inB
                 int const isSkipFrame =3D ZSTD_isSkipFrame(zds);
                 size_t loadedSize;
                 /* At this point we shouldn't be decompressing a block tha=
t we can stream. */
-                assert(neededInSize =3D=3D ZSTD_nextSrcSizeToDecompressWit=
hInputSize(zds, iend - ip));
+                assert(neededInSize =3D=3D ZSTD_nextSrcSizeToDecompressWit=
hInputSize(zds, (size_t)(iend - ip)));
                 if (isSkipFrame) {
                     loadedSize =3D MIN(toLoad, (size_t)(iend-ip));
                 } else {
@@ -2057,8 +2198,11 @@ size_t ZSTD_decompressStream(ZSTD_DStream* zds, ZSTD=
_outBuffer* output, ZSTD_inB
                                     "should never happen");
                     loadedSize =3D ZSTD_limitCopy(zds->inBuff + zds->inPos=
, toLoad, ip, (size_t)(iend-ip));
                 }
-                ip +=3D loadedSize;
-                zds->inPos +=3D loadedSize;
+                if (loadedSize !=3D 0) {
+                    /* ip may be NULL */
+                    ip +=3D loadedSize;
+                    zds->inPos +=3D loadedSize;
+                }
                 if (loadedSize < toLoad) { someMoreWork =3D 0; break; }   =
/* not enough input, wait for more */
=20
                 /* decode loaded input */
@@ -2068,14 +2212,17 @@ size_t ZSTD_decompressStream(ZSTD_DStream* zds, ZST=
D_outBuffer* output, ZSTD_inB
                 break;
             }
         case zdss_flush:
-            {   size_t const toFlushSize =3D zds->outEnd - zds->outStart;
+            {
+                size_t const toFlushSize =3D zds->outEnd - zds->outStart;
                 size_t const flushedSize =3D ZSTD_limitCopy(op, (size_t)(o=
end-op), zds->outBuff + zds->outStart, toFlushSize);
-                op +=3D flushedSize;
+
+                op =3D op ? op + flushedSize : op;
+
                 zds->outStart +=3D flushedSize;
                 if (flushedSize =3D=3D toFlushSize) {  /* flush completed =
*/
                     zds->streamStage =3D zdss_read;
                     if ( (zds->outBuffSize < zds->fParams.frameContentSize)
-                      && (zds->outStart + zds->fParams.blockSizeMax > zds-=
>outBuffSize) ) {
+                        && (zds->outStart + zds->fParams.blockSizeMax > zd=
s->outBuffSize) ) {
                         DEBUGLOG(5, "restart filling outBuff from beginnin=
g (left:%i, needed:%u)",
                                 (int)(zds->outBuffSize - zds->outStart),
                                 (U32)zds->fParams.blockSizeMax);
@@ -2089,7 +2236,7 @@ size_t ZSTD_decompressStream(ZSTD_DStream* zds, ZSTD_=
outBuffer* output, ZSTD_inB
=20
         default:
             assert(0);    /* impossible */
-            RETURN_ERROR(GENERIC, "impossible to reach");   /* some compil=
er require default to do something */
+            RETURN_ERROR(GENERIC, "impossible to reach");   /* some compil=
ers require default to do something */
     }   }
=20
     /* result */
@@ -2102,8 +2249,8 @@ size_t ZSTD_decompressStream(ZSTD_DStream* zds, ZSTD_=
outBuffer* output, ZSTD_inB
     if ((ip=3D=3Distart) && (op=3D=3Dostart)) {  /* no forward progress */
         zds->noForwardProgress ++;
         if (zds->noForwardProgress >=3D ZSTD_NO_FORWARD_PROGRESS_MAX) {
-            RETURN_ERROR_IF(op=3D=3Doend, dstSize_tooSmall, "");
-            RETURN_ERROR_IF(ip=3D=3Diend, srcSize_wrong, "");
+            RETURN_ERROR_IF(op=3D=3Doend, noForwardProgress_destFull, "");
+            RETURN_ERROR_IF(ip=3D=3Diend, noForwardProgress_inputEmpty, ""=
);
             assert(0);
         }
     } else {
@@ -2140,11 +2287,17 @@ size_t ZSTD_decompressStream_simpleArgs (
                             void* dst, size_t dstCapacity, size_t* dstPos,
                       const void* src, size_t srcSize, size_t* srcPos)
 {
-    ZSTD_outBuffer output =3D { dst, dstCapacity, *dstPos };
-    ZSTD_inBuffer  input  =3D { src, srcSize, *srcPos };
-    /* ZSTD_compress_generic() will check validity of dstPos and srcPos */
-    size_t const cErr =3D ZSTD_decompressStream(dctx, &output, &input);
-    *dstPos =3D output.pos;
-    *srcPos =3D input.pos;
-    return cErr;
+    ZSTD_outBuffer output;
+    ZSTD_inBuffer  input;
+    output.dst =3D dst;
+    output.size =3D dstCapacity;
+    output.pos =3D *dstPos;
+    input.src =3D src;
+    input.size =3D srcSize;
+    input.pos =3D *srcPos;
+    {   size_t const cErr =3D ZSTD_decompressStream(dctx, &output, &input);
+        *dstPos =3D output.pos;
+        *srcPos =3D input.pos;
+        return cErr;
+    }
 }
diff --git a/lib/zstd/decompress/zstd_decompress_block.c b/lib/zstd/decompr=
ess/zstd_decompress_block.c
index c1913b8e7c89..710eb0ffd5a3 100644
--- a/lib/zstd/decompress/zstd_decompress_block.c
+++ b/lib/zstd/decompress/zstd_decompress_block.c
@@ -1,5 +1,6 @@
+// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -20,12 +21,12 @@
 #include "../common/mem.h"         /* low level memory routines */
 #define FSE_STATIC_LINKING_ONLY
 #include "../common/fse.h"
-#define HUF_STATIC_LINKING_ONLY
 #include "../common/huf.h"
 #include "../common/zstd_internal.h"
 #include "zstd_decompress_internal.h"   /* ZSTD_DCtx */
 #include "zstd_ddict.h"  /* ZSTD_DDictDictContent */
 #include "zstd_decompress_block.h"
+#include "../common/bits.h"  /* ZSTD_highbit32 */
=20
 /*_*******************************************************
 *  Macros
@@ -51,6 +52,13 @@ static void ZSTD_copy4(void* dst, const void* src) { ZST=
D_memcpy(dst, src, 4); }
  *   Block decoding
  ***************************************************************/
=20
+static size_t ZSTD_blockSizeMax(ZSTD_DCtx const* dctx)
+{
+    size_t const blockSizeMax =3D dctx->isFrameDecompression ? dctx->fPara=
ms.blockSizeMax : ZSTD_BLOCKSIZE_MAX;
+    assert(blockSizeMax <=3D ZSTD_BLOCKSIZE_MAX);
+    return blockSizeMax;
+}
+
 /*! ZSTD_getcBlockSize() :
  *  Provides the size of compressed block from block header `src` */
 size_t ZSTD_getcBlockSize(const void* src, size_t srcSize,
@@ -73,41 +81,49 @@ size_t ZSTD_getcBlockSize(const void* src, size_t srcSi=
ze,
 static void ZSTD_allocateLiteralsBuffer(ZSTD_DCtx* dctx, void* const dst, =
const size_t dstCapacity, const size_t litSize,
     const streaming_operation streaming, const size_t expectedWriteSize, c=
onst unsigned splitImmediately)
 {
-    if (streaming =3D=3D not_streaming && dstCapacity > ZSTD_BLOCKSIZE_MAX=
 + WILDCOPY_OVERLENGTH + litSize + WILDCOPY_OVERLENGTH)
-    {
-        /* room for litbuffer to fit without read faulting */
-        dctx->litBuffer =3D (BYTE*)dst + ZSTD_BLOCKSIZE_MAX + WILDCOPY_OVE=
RLENGTH;
+    size_t const blockSizeMax =3D ZSTD_blockSizeMax(dctx);
+    assert(litSize <=3D blockSizeMax);
+    assert(dctx->isFrameDecompression || streaming =3D=3D not_streaming);
+    assert(expectedWriteSize <=3D blockSizeMax);
+    if (streaming =3D=3D not_streaming && dstCapacity > blockSizeMax + WIL=
DCOPY_OVERLENGTH + litSize + WILDCOPY_OVERLENGTH) {
+        /* If we aren't streaming, we can just put the literals after the =
output
+         * of the current block. We don't need to worry about overwriting =
the
+         * extDict of our window, because it doesn't exist.
+         * So if we have space after the end of the block, just put it the=
re.
+         */
+        dctx->litBuffer =3D (BYTE*)dst + blockSizeMax + WILDCOPY_OVERLENGT=
H;
         dctx->litBufferEnd =3D dctx->litBuffer + litSize;
         dctx->litBufferLocation =3D ZSTD_in_dst;
-    }
-    else if (litSize > ZSTD_LITBUFFEREXTRASIZE)
-    {
-        /* won't fit in litExtraBuffer, so it will be split between end of=
 dst and extra buffer */
+    } else if (litSize <=3D ZSTD_LITBUFFEREXTRASIZE) {
+        /* Literals fit entirely within the extra buffer, put them there t=
o avoid
+         * having to split the literals.
+         */
+        dctx->litBuffer =3D dctx->litExtraBuffer;
+        dctx->litBufferEnd =3D dctx->litBuffer + litSize;
+        dctx->litBufferLocation =3D ZSTD_not_in_dst;
+    } else {
+        assert(blockSizeMax > ZSTD_LITBUFFEREXTRASIZE);
+        /* Literals must be split between the output block and the extra l=
it
+         * buffer. We fill the extra lit buffer with the tail of the liter=
als,
+         * and put the rest of the literals at the end of the block, with
+         * WILDCOPY_OVERLENGTH of buffer room to allow for overreads.
+         * This MUST not write more than our maxBlockSize beyond dst, beca=
use in
+         * streaming mode, that could overwrite part of our extDict window.
+         */
         if (splitImmediately) {
             /* won't fit in litExtraBuffer, so it will be split between en=
d of dst and extra buffer */
             dctx->litBuffer =3D (BYTE*)dst + expectedWriteSize - litSize +=
 ZSTD_LITBUFFEREXTRASIZE - WILDCOPY_OVERLENGTH;
             dctx->litBufferEnd =3D dctx->litBuffer + litSize - ZSTD_LITBUF=
FEREXTRASIZE;
-        }
-        else {
-            /* initially this will be stored entirely in dst during huffma=
n decoding, it will partially shifted to litExtraBuffer after */
+        } else {
+            /* initially this will be stored entirely in dst during huffma=
n decoding, it will partially be shifted to litExtraBuffer after */
             dctx->litBuffer =3D (BYTE*)dst + expectedWriteSize - litSize;
             dctx->litBufferEnd =3D (BYTE*)dst + expectedWriteSize;
         }
         dctx->litBufferLocation =3D ZSTD_split;
-    }
-    else
-    {
-        /* fits entirely within litExtraBuffer, so no split is necessary */
-        dctx->litBuffer =3D dctx->litExtraBuffer;
-        dctx->litBufferEnd =3D dctx->litBuffer + litSize;
-        dctx->litBufferLocation =3D ZSTD_not_in_dst;
+        assert(dctx->litBufferEnd <=3D (BYTE*)dst + expectedWriteSize);
     }
 }
=20
-/* Hidden declaration for fullbench */
-size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx,
-                          const void* src, size_t srcSize,
-                          void* dst, size_t dstCapacity, const streaming_o=
peration streaming);
 /*! ZSTD_decodeLiteralsBlock() :
  * Where it is possible to do so without being stomped by the output durin=
g decompression, the literals block will be stored
  * in the dstBuffer.  If there is room to do so, it will be stored in full=
 in the excess dst space after where the current
@@ -116,7 +132,7 @@ size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx,
  *
  * @return : nb of bytes read from src (< srcSize )
  *  note : symbol not declared but exposed for fullbench */
-size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx,
+static size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx,
                           const void* src, size_t srcSize,   /* note : src=
Size < BLOCKSIZE */
                           void* dst, size_t dstCapacity, const streaming_o=
peration streaming)
 {
@@ -124,7 +140,8 @@ size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx,
     RETURN_ERROR_IF(srcSize < MIN_CBLOCK_SIZE, corruption_detected, "");
=20
     {   const BYTE* const istart =3D (const BYTE*) src;
-        symbolEncodingType_e const litEncType =3D (symbolEncodingType_e)(i=
start[0] & 3);
+        SymbolEncodingType_e const litEncType =3D (SymbolEncodingType_e)(i=
start[0] & 3);
+        size_t const blockSizeMax =3D ZSTD_blockSizeMax(dctx);
=20
         switch(litEncType)
         {
@@ -134,13 +151,16 @@ size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx,
             ZSTD_FALLTHROUGH;
=20
         case set_compressed:
-            RETURN_ERROR_IF(srcSize < 5, corruption_detected, "srcSize >=
=3D MIN_CBLOCK_SIZE =3D=3D 3; here we need up to 5 for case 3");
+            RETURN_ERROR_IF(srcSize < 5, corruption_detected, "srcSize >=
=3D MIN_CBLOCK_SIZE =3D=3D 2; here we need up to 5 for case 3");
             {   size_t lhSize, litSize, litCSize;
                 U32 singleStream=3D0;
                 U32 const lhlCode =3D (istart[0] >> 2) & 3;
                 U32 const lhc =3D MEM_readLE32(istart);
                 size_t hufSuccess;
-                size_t expectedWriteSize =3D MIN(ZSTD_BLOCKSIZE_MAX, dstCa=
pacity);
+                size_t expectedWriteSize =3D MIN(blockSizeMax, dstCapacity=
);
+                int const flags =3D 0
+                    | (ZSTD_DCtx_get_bmi2(dctx) ? HUF_flags_bmi2 : 0)
+                    | (dctx->disableHufAsm ? HUF_flags_disableAsm : 0);
                 switch(lhlCode)
                 {
                 case 0: case 1: default:   /* note : default is impossible=
, since lhlCode into [0..3] */
@@ -164,7 +184,11 @@ size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx,
                     break;
                 }
                 RETURN_ERROR_IF(litSize > 0 && dst =3D=3D NULL, dstSize_to=
oSmall, "NULL not handled");
-                RETURN_ERROR_IF(litSize > ZSTD_BLOCKSIZE_MAX, corruption_d=
etected, "");
+                RETURN_ERROR_IF(litSize > blockSizeMax, corruption_detecte=
d, "");
+                if (!singleStream)
+                    RETURN_ERROR_IF(litSize < MIN_LITERALS_FOR_4_STREAMS, =
literals_headerWrong,
+                        "Not enough literals (%zu) for the 4-streams mode =
(min %u)",
+                        litSize, MIN_LITERALS_FOR_4_STREAMS);
                 RETURN_ERROR_IF(litCSize + lhSize > srcSize, corruption_de=
tected, "");
                 RETURN_ERROR_IF(expectedWriteSize < litSize , dstSize_tooS=
mall, "");
                 ZSTD_allocateLiteralsBuffer(dctx, dst, dstCapacity, litSiz=
e, streaming, expectedWriteSize, 0);
@@ -176,13 +200,14 @@ size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx,
=20
                 if (litEncType=3D=3Dset_repeat) {
                     if (singleStream) {
-                        hufSuccess =3D HUF_decompress1X_usingDTable_bmi2(
+                        hufSuccess =3D HUF_decompress1X_usingDTable(
                             dctx->litBuffer, litSize, istart+lhSize, litCS=
ize,
-                            dctx->HUFptr, ZSTD_DCtx_get_bmi2(dctx));
+                            dctx->HUFptr, flags);
                     } else {
-                        hufSuccess =3D HUF_decompress4X_usingDTable_bmi2(
+                        assert(litSize >=3D MIN_LITERALS_FOR_4_STREAMS);
+                        hufSuccess =3D HUF_decompress4X_usingDTable(
                             dctx->litBuffer, litSize, istart+lhSize, litCS=
ize,
-                            dctx->HUFptr, ZSTD_DCtx_get_bmi2(dctx));
+                            dctx->HUFptr, flags);
                     }
                 } else {
                     if (singleStream) {
@@ -190,26 +215,28 @@ size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx,
                         hufSuccess =3D HUF_decompress1X_DCtx_wksp(
                             dctx->entropy.hufTable, dctx->litBuffer, litSi=
ze,
                             istart+lhSize, litCSize, dctx->workspace,
-                            sizeof(dctx->workspace));
+                            sizeof(dctx->workspace), flags);
 #else
-                        hufSuccess =3D HUF_decompress1X1_DCtx_wksp_bmi2(
+                        hufSuccess =3D HUF_decompress1X1_DCtx_wksp(
                             dctx->entropy.hufTable, dctx->litBuffer, litSi=
ze,
                             istart+lhSize, litCSize, dctx->workspace,
-                            sizeof(dctx->workspace), ZSTD_DCtx_get_bmi2(dc=
tx));
+                            sizeof(dctx->workspace), flags);
 #endif
                     } else {
-                        hufSuccess =3D HUF_decompress4X_hufOnly_wksp_bmi2(
+                        hufSuccess =3D HUF_decompress4X_hufOnly_wksp(
                             dctx->entropy.hufTable, dctx->litBuffer, litSi=
ze,
                             istart+lhSize, litCSize, dctx->workspace,
-                            sizeof(dctx->workspace), ZSTD_DCtx_get_bmi2(dc=
tx));
+                            sizeof(dctx->workspace), flags);
                     }
                 }
                 if (dctx->litBufferLocation =3D=3D ZSTD_split)
                 {
+                    assert(litSize > ZSTD_LITBUFFEREXTRASIZE);
                     ZSTD_memcpy(dctx->litExtraBuffer, dctx->litBufferEnd -=
 ZSTD_LITBUFFEREXTRASIZE, ZSTD_LITBUFFEREXTRASIZE);
                     ZSTD_memmove(dctx->litBuffer + ZSTD_LITBUFFEREXTRASIZE=
 - WILDCOPY_OVERLENGTH, dctx->litBuffer, litSize - ZSTD_LITBUFFEREXTRASIZE);
                     dctx->litBuffer +=3D ZSTD_LITBUFFEREXTRASIZE - WILDCOP=
Y_OVERLENGTH;
                     dctx->litBufferEnd -=3D WILDCOPY_OVERLENGTH;
+                    assert(dctx->litBufferEnd <=3D (BYTE*)dst + blockSizeM=
ax);
                 }
=20
                 RETURN_ERROR_IF(HUF_isError(hufSuccess), corruption_detect=
ed, "");
@@ -224,7 +251,7 @@ size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx,
         case set_basic:
             {   size_t litSize, lhSize;
                 U32 const lhlCode =3D ((istart[0]) >> 2) & 3;
-                size_t expectedWriteSize =3D MIN(ZSTD_BLOCKSIZE_MAX, dstCa=
pacity);
+                size_t expectedWriteSize =3D MIN(blockSizeMax, dstCapacity=
);
                 switch(lhlCode)
                 {
                 case 0: case 2: default:   /* note : default is impossible=
, since lhlCode into [0..3] */
@@ -237,11 +264,13 @@ size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx,
                     break;
                 case 3:
                     lhSize =3D 3;
+                    RETURN_ERROR_IF(srcSize<3, corruption_detected, "srcSi=
ze >=3D MIN_CBLOCK_SIZE =3D=3D 2; here we need lhSize =3D 3");
                     litSize =3D MEM_readLE24(istart) >> 4;
                     break;
                 }
=20
                 RETURN_ERROR_IF(litSize > 0 && dst =3D=3D NULL, dstSize_to=
oSmall, "NULL not handled");
+                RETURN_ERROR_IF(litSize > blockSizeMax, corruption_detecte=
d, "");
                 RETURN_ERROR_IF(expectedWriteSize < litSize, dstSize_tooSm=
all, "");
                 ZSTD_allocateLiteralsBuffer(dctx, dst, dstCapacity, litSiz=
e, streaming, expectedWriteSize, 1);
                 if (lhSize+litSize+WILDCOPY_OVERLENGTH > srcSize) {  /* ri=
sk reading beyond src buffer with wildcopy */
@@ -270,7 +299,7 @@ size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx,
         case set_rle:
             {   U32 const lhlCode =3D ((istart[0]) >> 2) & 3;
                 size_t litSize, lhSize;
-                size_t expectedWriteSize =3D MIN(ZSTD_BLOCKSIZE_MAX, dstCa=
pacity);
+                size_t expectedWriteSize =3D MIN(blockSizeMax, dstCapacity=
);
                 switch(lhlCode)
                 {
                 case 0: case 2: default:   /* note : default is impossible=
, since lhlCode into [0..3] */
@@ -279,16 +308,17 @@ size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx,
                     break;
                 case 1:
                     lhSize =3D 2;
+                    RETURN_ERROR_IF(srcSize<3, corruption_detected, "srcSi=
ze >=3D MIN_CBLOCK_SIZE =3D=3D 2; here we need lhSize+1 =3D 3");
                     litSize =3D MEM_readLE16(istart) >> 4;
                     break;
                 case 3:
                     lhSize =3D 3;
+                    RETURN_ERROR_IF(srcSize<4, corruption_detected, "srcSi=
ze >=3D MIN_CBLOCK_SIZE =3D=3D 2; here we need lhSize+1 =3D 4");
                     litSize =3D MEM_readLE24(istart) >> 4;
-                    RETURN_ERROR_IF(srcSize<4, corruption_detected, "srcSi=
ze >=3D MIN_CBLOCK_SIZE =3D=3D 3; here we need lhSize+1 =3D 4");
                     break;
                 }
                 RETURN_ERROR_IF(litSize > 0 && dst =3D=3D NULL, dstSize_to=
oSmall, "NULL not handled");
-                RETURN_ERROR_IF(litSize > ZSTD_BLOCKSIZE_MAX, corruption_d=
etected, "");
+                RETURN_ERROR_IF(litSize > blockSizeMax, corruption_detecte=
d, "");
                 RETURN_ERROR_IF(expectedWriteSize < litSize, dstSize_tooSm=
all, "");
                 ZSTD_allocateLiteralsBuffer(dctx, dst, dstCapacity, litSiz=
e, streaming, expectedWriteSize, 1);
                 if (dctx->litBufferLocation =3D=3D ZSTD_split)
@@ -310,6 +340,18 @@ size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx,
     }
 }
=20
+/* Hidden declaration for fullbench */
+size_t ZSTD_decodeLiteralsBlock_wrapper(ZSTD_DCtx* dctx,
+                          const void* src, size_t srcSize,
+                          void* dst, size_t dstCapacity);
+size_t ZSTD_decodeLiteralsBlock_wrapper(ZSTD_DCtx* dctx,
+                          const void* src, size_t srcSize,
+                          void* dst, size_t dstCapacity)
+{
+    dctx->isFrameDecompression =3D 0;
+    return ZSTD_decodeLiteralsBlock(dctx, src, srcSize, dst, dstCapacity, =
not_streaming);
+}
+
 /* Default FSE distribution tables.
  * These are pre-calculated FSE decoding tables using default distribution=
s as defined in specification :
  * https://github.com/facebook/zstd/blob/release/doc/zstd_compression_form=
at.md#default-distributions
@@ -317,7 +359,7 @@ size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx,
  * - start from default distributions, present in /lib/common/zstd_interna=
l.h
  * - generate tables normally, using ZSTD_buildFSETable()
  * - printout the content of tables
- * - pretify output, report below, test with fuzzer to ensure it's correct=
 */
+ * - prettify output, report below, test with fuzzer to ensure it's correc=
t */
=20
 /* Default FSE distribution table for Literal Lengths */
 static const ZSTD_seqSymbol LL_defaultDTable[(1<<LL_DEFAULTNORMLOG)+1] =3D=
 {
@@ -506,14 +548,15 @@ void ZSTD_buildFSETable_body(ZSTD_seqSymbol* dt,
                 for (i =3D 8; i < n; i +=3D 8) {
                     MEM_write64(spread + pos + i, sv);
                 }
-                pos +=3D n;
+                assert(n>=3D0);
+                pos +=3D (size_t)n;
             }
         }
         /* Now we spread those positions across the table.
-         * The benefit of doing it in two stages is that we avoid the the
+         * The benefit of doing it in two stages is that we avoid the
          * variable size inner loop, which caused lots of branch misses.
          * Now we can run through all the positions without any branch mis=
ses.
-         * We unroll the loop twice, since that is what emperically worked=
 best.
+         * We unroll the loop twice, since that is what empirically worked=
 best.
          */
         {
             size_t position =3D 0;
@@ -540,7 +583,7 @@ void ZSTD_buildFSETable_body(ZSTD_seqSymbol* dt,
             for (i=3D0; i<n; i++) {
                 tableDecode[position].baseValue =3D s;
                 position =3D (position + step) & tableMask;
-                while (position > highThreshold) position =3D (position + =
step) & tableMask;   /* lowprob area */
+                while (UNLIKELY(position > highThreshold)) position =3D (p=
osition + step) & tableMask;   /* lowprob area */
         }   }
         assert(position =3D=3D 0); /* position must reach all cells once, =
otherwise normalizedCounter is incorrect */
     }
@@ -551,7 +594,7 @@ void ZSTD_buildFSETable_body(ZSTD_seqSymbol* dt,
         for (u=3D0; u<tableSize; u++) {
             U32 const symbol =3D tableDecode[u].baseValue;
             U32 const nextState =3D symbolNext[symbol]++;
-            tableDecode[u].nbBits =3D (BYTE) (tableLog - BIT_highbit32(nex=
tState) );
+            tableDecode[u].nbBits =3D (BYTE) (tableLog - ZSTD_highbit32(ne=
xtState) );
             tableDecode[u].nextState =3D (U16) ( (nextState << tableDecode=
[u].nbBits) - tableSize);
             assert(nbAdditionalBits[symbol] < 255);
             tableDecode[u].nbAdditionalBits =3D nbAdditionalBits[symbol];
@@ -603,7 +646,7 @@ void ZSTD_buildFSETable(ZSTD_seqSymbol* dt,
  * @return : nb bytes read from src,
  *           or an error code if it fails */
 static size_t ZSTD_buildSeqTable(ZSTD_seqSymbol* DTableSpace, const ZSTD_s=
eqSymbol** DTablePtr,
-                                 symbolEncodingType_e type, unsigned max, =
U32 maxLog,
+                                 SymbolEncodingType_e type, unsigned max, =
U32 maxLog,
                                  const void* src, size_t srcSize,
                                  const U32* baseValue, const U8* nbAdditio=
nalBits,
                                  const ZSTD_seqSymbol* defaultTable, U32 f=
lagRepeatTable,
@@ -664,11 +707,6 @@ size_t ZSTD_decodeSeqHeaders(ZSTD_DCtx* dctx, int* nbS=
eqPtr,
=20
     /* SeqHead */
     nbSeq =3D *ip++;
-    if (!nbSeq) {
-        *nbSeqPtr=3D0;
-        RETURN_ERROR_IF(srcSize !=3D 1, srcSize_wrong, "");
-        return 1;
-    }
     if (nbSeq > 0x7F) {
         if (nbSeq =3D=3D 0xFF) {
             RETURN_ERROR_IF(ip+2 > iend, srcSize_wrong, "");
@@ -681,11 +719,19 @@ size_t ZSTD_decodeSeqHeaders(ZSTD_DCtx* dctx, int* nb=
SeqPtr,
     }
     *nbSeqPtr =3D nbSeq;
=20
+    if (nbSeq =3D=3D 0) {
+        /* No sequence : section ends immediately */
+        RETURN_ERROR_IF(ip !=3D iend, corruption_detected,
+            "extraneous data present in the Sequences section");
+        return (size_t)(ip - istart);
+    }
+
     /* FSE table descriptors */
     RETURN_ERROR_IF(ip+1 > iend, srcSize_wrong, ""); /* minimum possible s=
ize: 1 byte for symbol encoding types */
-    {   symbolEncodingType_e const LLtype =3D (symbolEncodingType_e)(*ip >=
> 6);
-        symbolEncodingType_e const OFtype =3D (symbolEncodingType_e)((*ip =
>> 4) & 3);
-        symbolEncodingType_e const MLtype =3D (symbolEncodingType_e)((*ip =
>> 2) & 3);
+    RETURN_ERROR_IF(*ip & 3, corruption_detected, ""); /* The last field, =
Reserved, must be all-zeroes. */
+    {   SymbolEncodingType_e const LLtype =3D (SymbolEncodingType_e)(*ip >=
> 6);
+        SymbolEncodingType_e const OFtype =3D (SymbolEncodingType_e)((*ip =
>> 4) & 3);
+        SymbolEncodingType_e const MLtype =3D (SymbolEncodingType_e)((*ip =
>> 2) & 3);
         ip++;
=20
         /* Build DTables */
@@ -829,7 +875,7 @@ static void ZSTD_safecopy(BYTE* op, const BYTE* const o=
end_w, BYTE const* ip, pt
 /* ZSTD_safecopyDstBeforeSrc():
  * This version allows overlap with dst before src, or handles the non-ove=
rlap case with dst after src
  * Kept separate from more common ZSTD_safecopy case to avoid performance =
impact to the safecopy common case */
-static void ZSTD_safecopyDstBeforeSrc(BYTE* op, BYTE const* ip, ptrdiff_t =
length) {
+static void ZSTD_safecopyDstBeforeSrc(BYTE* op, const BYTE* ip, ptrdiff_t =
length) {
     ptrdiff_t const diff =3D op - ip;
     BYTE* const oend =3D op + length;
=20
@@ -858,6 +904,7 @@ static void ZSTD_safecopyDstBeforeSrc(BYTE* op, BYTE co=
nst* ip, ptrdiff_t length
  * to be optimized for many small sequences, since those fall into ZSTD_ex=
ecSequence().
  */
 FORCE_NOINLINE
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
 size_t ZSTD_execSequenceEnd(BYTE* op,
     BYTE* const oend, seq_t sequence,
     const BYTE** litPtr, const BYTE* const litLimit,
@@ -905,6 +952,7 @@ size_t ZSTD_execSequenceEnd(BYTE* op,
  * This version is intended to be used during instances where the litBuffe=
r is still split.  It is kept separate to avoid performance impact for the =
good case.
  */
 FORCE_NOINLINE
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
 size_t ZSTD_execSequenceEndSplitLitBuffer(BYTE* op,
     BYTE* const oend, const BYTE* const oend_w, seq_t sequence,
     const BYTE** litPtr, const BYTE* const litLimit,
@@ -950,6 +998,7 @@ size_t ZSTD_execSequenceEndSplitLitBuffer(BYTE* op,
 }
=20
 HINT_INLINE
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
 size_t ZSTD_execSequence(BYTE* op,
     BYTE* const oend, seq_t sequence,
     const BYTE** litPtr, const BYTE* const litLimit,
@@ -964,6 +1013,11 @@ size_t ZSTD_execSequence(BYTE* op,
=20
     assert(op !=3D NULL /* Precondition */);
     assert(oend_w < oend /* No underflow */);
+
+#if defined(__aarch64__)
+    /* prefetch sequence starting from match that will be used for copy la=
ter */
+    PREFETCH_L1(match);
+#endif
     /* Handle edge cases in a slow path:
      *   - Read beyond end of literals
      *   - Match end is within WILDCOPY_OVERLIMIT of oend
@@ -1043,6 +1097,7 @@ size_t ZSTD_execSequence(BYTE* op,
 }
=20
 HINT_INLINE
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
 size_t ZSTD_execSequenceSplitLitBuffer(BYTE* op,
     BYTE* const oend, const BYTE* const oend_w, seq_t sequence,
     const BYTE** litPtr, const BYTE* const litLimit,
@@ -1154,7 +1209,7 @@ ZSTD_updateFseStateWithDInfo(ZSTD_fseState* DStatePtr=
, BIT_DStream_t* bitD, U16
 }
=20
 /* We need to add at most (ZSTD_WINDOWLOG_MAX_32 - 1) bits to read the max=
imum
- * offset bits. But we can only read at most (STREAM_ACCUMULATOR_MIN_32 - =
1)
+ * offset bits. But we can only read at most STREAM_ACCUMULATOR_MIN_32
  * bits before reloading. This value is the maximum number of bytes we read
  * after reloading when we are decoding long offsets.
  */
@@ -1165,13 +1220,37 @@ ZSTD_updateFseStateWithDInfo(ZSTD_fseState* DStateP=
tr, BIT_DStream_t* bitD, U16
=20
 typedef enum { ZSTD_lo_isRegularOffset, ZSTD_lo_isLongOffset=3D1 } ZSTD_lo=
ngOffset_e;
=20
+/*
+ * ZSTD_decodeSequence():
+ * @p longOffsets : tells the decoder to reload more bit while decoding la=
rge offsets
+ *                  only used in 32-bit mode
+ * @return : Sequence (litL + matchL + offset)
+ */
 FORCE_INLINE_TEMPLATE seq_t
-ZSTD_decodeSequence(seqState_t* seqState, const ZSTD_longOffset_e longOffs=
ets)
+ZSTD_decodeSequence(seqState_t* seqState, const ZSTD_longOffset_e longOffs=
ets, const int isLastSeq)
 {
     seq_t seq;
+    /*
+     * ZSTD_seqSymbol is a 64 bits wide structure.
+     * It can be loaded in one operation
+     * and its fields extracted by simply shifting or bit-extracting on aa=
rch64.
+     * GCC doesn't recognize this and generates more unnecessary ldr/ldrb/=
ldrh
+     * operations that cause performance drop. This can be avoided by usin=
g this
+     * ZSTD_memcpy hack.
+     */
+#if defined(__aarch64__) && (defined(__GNUC__) && !defined(__clang__))
+    ZSTD_seqSymbol llDInfoS, mlDInfoS, ofDInfoS;
+    ZSTD_seqSymbol* const llDInfo =3D &llDInfoS;
+    ZSTD_seqSymbol* const mlDInfo =3D &mlDInfoS;
+    ZSTD_seqSymbol* const ofDInfo =3D &ofDInfoS;
+    ZSTD_memcpy(llDInfo, seqState->stateLL.table + seqState->stateLL.state=
, sizeof(ZSTD_seqSymbol));
+    ZSTD_memcpy(mlDInfo, seqState->stateML.table + seqState->stateML.state=
, sizeof(ZSTD_seqSymbol));
+    ZSTD_memcpy(ofDInfo, seqState->stateOffb.table + seqState->stateOffb.s=
tate, sizeof(ZSTD_seqSymbol));
+#else
     const ZSTD_seqSymbol* const llDInfo =3D seqState->stateLL.table + seqS=
tate->stateLL.state;
     const ZSTD_seqSymbol* const mlDInfo =3D seqState->stateML.table + seqS=
tate->stateML.state;
     const ZSTD_seqSymbol* const ofDInfo =3D seqState->stateOffb.table + se=
qState->stateOffb.state;
+#endif
     seq.matchLength =3D mlDInfo->baseValue;
     seq.litLength =3D llDInfo->baseValue;
     {   U32 const ofBase =3D ofDInfo->baseValue;
@@ -1186,28 +1265,31 @@ ZSTD_decodeSequence(seqState_t* seqState, const ZST=
D_longOffset_e longOffsets)
         U32 const llnbBits =3D llDInfo->nbBits;
         U32 const mlnbBits =3D mlDInfo->nbBits;
         U32 const ofnbBits =3D ofDInfo->nbBits;
+
+        assert(llBits <=3D MaxLLBits);
+        assert(mlBits <=3D MaxMLBits);
+        assert(ofBits <=3D MaxOff);
         /*
          * As gcc has better branch and block analyzers, sometimes it is o=
nly
-         * valuable to mark likelyness for clang, it gives around 3-4% of
+         * valuable to mark likeliness for clang, it gives around 3-4% of
          * performance.
          */
=20
         /* sequence */
         {   size_t offset;
-    #if defined(__clang__)
-            if (LIKELY(ofBits > 1)) {
-    #else
             if (ofBits > 1) {
-    #endif
                 ZSTD_STATIC_ASSERT(ZSTD_lo_isLongOffset =3D=3D 1);
                 ZSTD_STATIC_ASSERT(LONG_OFFSETS_MAX_EXTRA_BITS_32 =3D=3D 5=
);
-                assert(ofBits <=3D MaxOff);
+                ZSTD_STATIC_ASSERT(STREAM_ACCUMULATOR_MIN_32 > LONG_OFFSET=
S_MAX_EXTRA_BITS_32);
+                ZSTD_STATIC_ASSERT(STREAM_ACCUMULATOR_MIN_32 - LONG_OFFSET=
S_MAX_EXTRA_BITS_32 >=3D MaxMLBits);
                 if (MEM_32bits() && longOffsets && (ofBits >=3D STREAM_ACC=
UMULATOR_MIN_32)) {
-                    U32 const extraBits =3D ofBits - MIN(ofBits, 32 - seqS=
tate->DStream.bitsConsumed);
+                    /* Always read extra bits, this keeps the logic simple,
+                     * avoids branches, and avoids accidentally reading 0 =
bits.
+                     */
+                    U32 const extraBits =3D LONG_OFFSETS_MAX_EXTRA_BITS_32;
                     offset =3D ofBase + (BIT_readBitsFast(&seqState->DStre=
am, ofBits - extraBits) << extraBits);
                     BIT_reloadDStream(&seqState->DStream);
-                    if (extraBits) offset +=3D BIT_readBitsFast(&seqState-=
>DStream, extraBits);
-                    assert(extraBits <=3D LONG_OFFSETS_MAX_EXTRA_BITS_32);=
   /* to avoid another reload */
+                    offset +=3D BIT_readBitsFast(&seqState->DStream, extra=
Bits);
                 } else {
                     offset =3D ofBase + BIT_readBitsFast(&seqState->DStrea=
m, ofBits/*>0*/);   /* <=3D  (ZSTD_WINDOWLOG_MAX-1) bits */
                     if (MEM_32bits()) BIT_reloadDStream(&seqState->DStream=
);
@@ -1224,7 +1306,7 @@ ZSTD_decodeSequence(seqState_t* seqState, const ZSTD_=
longOffset_e longOffsets)
                 } else {
                     offset =3D ofBase + ll0 + BIT_readBitsFast(&seqState->=
DStream, 1);
                     {   size_t temp =3D (offset=3D=3D3) ? seqState->prevOf=
fset[0] - 1 : seqState->prevOffset[offset];
-                        temp +=3D !temp;   /* 0 is not valid; input is cor=
rupted; force offset to 1 */
+                        temp -=3D !temp; /* 0 is not valid: input corrupte=
d =3D> force offset to -1 =3D> corruption detected at execSequence */
                         if (offset !=3D 1) seqState->prevOffset[2] =3D seq=
State->prevOffset[1];
                         seqState->prevOffset[1] =3D seqState->prevOffset[0=
];
                         seqState->prevOffset[0] =3D offset =3D temp;
@@ -1232,11 +1314,7 @@ ZSTD_decodeSequence(seqState_t* seqState, const ZSTD=
_longOffset_e longOffsets)
             seq.offset =3D offset;
         }
=20
-    #if defined(__clang__)
-        if (UNLIKELY(mlBits > 0))
-    #else
         if (mlBits > 0)
-    #endif
             seq.matchLength +=3D BIT_readBitsFast(&seqState->DStream, mlBi=
ts/*>0*/);
=20
         if (MEM_32bits() && (mlBits+llBits >=3D STREAM_ACCUMULATOR_MIN_32-=
LONG_OFFSETS_MAX_EXTRA_BITS_32))
@@ -1246,11 +1324,7 @@ ZSTD_decodeSequence(seqState_t* seqState, const ZSTD=
_longOffset_e longOffsets)
         /* Ensure there are enough bits to read the rest of data in 64-bit=
 mode. */
         ZSTD_STATIC_ASSERT(16+LLFSELog+MLFSELog+OffFSELog < STREAM_ACCUMUL=
ATOR_MIN_64);
=20
-    #if defined(__clang__)
-        if (UNLIKELY(llBits > 0))
-    #else
         if (llBits > 0)
-    #endif
             seq.litLength +=3D BIT_readBitsFast(&seqState->DStream, llBits=
/*>0*/);
=20
         if (MEM_32bits())
@@ -1259,17 +1333,22 @@ ZSTD_decodeSequence(seqState_t* seqState, const ZST=
D_longOffset_e longOffsets)
         DEBUGLOG(6, "seq: litL=3D%u, matchL=3D%u, offset=3D%u",
                     (U32)seq.litLength, (U32)seq.matchLength, (U32)seq.off=
set);
=20
-        ZSTD_updateFseStateWithDInfo(&seqState->stateLL, &seqState->DStrea=
m, llNext, llnbBits);    /* <=3D  9 bits */
-        ZSTD_updateFseStateWithDInfo(&seqState->stateML, &seqState->DStrea=
m, mlNext, mlnbBits);    /* <=3D  9 bits */
-        if (MEM_32bits()) BIT_reloadDStream(&seqState->DStream);    /* <=
=3D 18 bits */
-        ZSTD_updateFseStateWithDInfo(&seqState->stateOffb, &seqState->DStr=
eam, ofNext, ofnbBits);  /* <=3D  8 bits */
+        if (!isLastSeq) {
+            /* don't update FSE state for last Sequence */
+            ZSTD_updateFseStateWithDInfo(&seqState->stateLL, &seqState->DS=
tream, llNext, llnbBits);    /* <=3D  9 bits */
+            ZSTD_updateFseStateWithDInfo(&seqState->stateML, &seqState->DS=
tream, mlNext, mlnbBits);    /* <=3D  9 bits */
+            if (MEM_32bits()) BIT_reloadDStream(&seqState->DStream);    /*=
 <=3D 18 bits */
+            ZSTD_updateFseStateWithDInfo(&seqState->stateOffb, &seqState->=
DStream, ofNext, ofnbBits);  /* <=3D  8 bits */
+            BIT_reloadDStream(&seqState->DStream);
+        }
     }
=20
     return seq;
 }
=20
-#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
-MEM_STATIC int ZSTD_dictionaryIsActive(ZSTD_DCtx const* dctx, BYTE const* =
prefixStart, BYTE const* oLitEnd)
+#if defined(FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION) && defined(FUZZING_A=
SSERT_VALID_SEQUENCE)
+#if DEBUGLEVEL >=3D 1
+static int ZSTD_dictionaryIsActive(ZSTD_DCtx const* dctx, BYTE const* pref=
ixStart, BYTE const* oLitEnd)
 {
     size_t const windowSize =3D dctx->fParams.windowSize;
     /* No dictionary used. */
@@ -1283,30 +1362,33 @@ MEM_STATIC int ZSTD_dictionaryIsActive(ZSTD_DCtx co=
nst* dctx, BYTE const* prefix
     /* Dictionary is active. */
     return 1;
 }
+#endif
=20
-MEM_STATIC void ZSTD_assertValidSequence(
+static void ZSTD_assertValidSequence(
         ZSTD_DCtx const* dctx,
         BYTE const* op, BYTE const* oend,
         seq_t const seq,
         BYTE const* prefixStart, BYTE const* virtualStart)
 {
 #if DEBUGLEVEL >=3D 1
-    size_t const windowSize =3D dctx->fParams.windowSize;
-    size_t const sequenceSize =3D seq.litLength + seq.matchLength;
-    BYTE const* const oLitEnd =3D op + seq.litLength;
-    DEBUGLOG(6, "Checking sequence: litL=3D%u matchL=3D%u offset=3D%u",
-            (U32)seq.litLength, (U32)seq.matchLength, (U32)seq.offset);
-    assert(op <=3D oend);
-    assert((size_t)(oend - op) >=3D sequenceSize);
-    assert(sequenceSize <=3D ZSTD_BLOCKSIZE_MAX);
-    if (ZSTD_dictionaryIsActive(dctx, prefixStart, oLitEnd)) {
-        size_t const dictSize =3D (size_t)((char const*)dctx->dictContentE=
ndForFuzzing - (char const*)dctx->dictContentBeginForFuzzing);
-        /* Offset must be within the dictionary. */
-        assert(seq.offset <=3D (size_t)(oLitEnd - virtualStart));
-        assert(seq.offset <=3D windowSize + dictSize);
-    } else {
-        /* Offset must be within our window. */
-        assert(seq.offset <=3D windowSize);
+    if (dctx->isFrameDecompression) {
+        size_t const windowSize =3D dctx->fParams.windowSize;
+        size_t const sequenceSize =3D seq.litLength + seq.matchLength;
+        BYTE const* const oLitEnd =3D op + seq.litLength;
+        DEBUGLOG(6, "Checking sequence: litL=3D%u matchL=3D%u offset=3D%u",
+                (U32)seq.litLength, (U32)seq.matchLength, (U32)seq.offset);
+        assert(op <=3D oend);
+        assert((size_t)(oend - op) >=3D sequenceSize);
+        assert(sequenceSize <=3D ZSTD_blockSizeMax(dctx));
+        if (ZSTD_dictionaryIsActive(dctx, prefixStart, oLitEnd)) {
+            size_t const dictSize =3D (size_t)((char const*)dctx->dictCont=
entEndForFuzzing - (char const*)dctx->dictContentBeginForFuzzing);
+            /* Offset must be within the dictionary. */
+            assert(seq.offset <=3D (size_t)(oLitEnd - virtualStart));
+            assert(seq.offset <=3D windowSize + dictSize);
+        } else {
+            /* Offset must be within our window. */
+            assert(seq.offset <=3D windowSize);
+        }
     }
 #else
     (void)dctx, (void)op, (void)oend, (void)seq, (void)prefixStart, (void)=
virtualStart;
@@ -1322,23 +1404,21 @@ DONT_VECTORIZE
 ZSTD_decompressSequences_bodySplitLitBuffer( ZSTD_DCtx* dctx,
                                void* dst, size_t maxDstSize,
                          const void* seqStart, size_t seqSize, int nbSeq,
-                         const ZSTD_longOffset_e isLongOffset,
-                         const int frame)
+                         const ZSTD_longOffset_e isLongOffset)
 {
     const BYTE* ip =3D (const BYTE*)seqStart;
     const BYTE* const iend =3D ip + seqSize;
     BYTE* const ostart =3D (BYTE*)dst;
-    BYTE* const oend =3D ostart + maxDstSize;
+    BYTE* const oend =3D ZSTD_maybeNullPtrAdd(ostart, maxDstSize);
     BYTE* op =3D ostart;
     const BYTE* litPtr =3D dctx->litPtr;
     const BYTE* litBufferEnd =3D dctx->litBufferEnd;
     const BYTE* const prefixStart =3D (const BYTE*) (dctx->prefixStart);
     const BYTE* const vBase =3D (const BYTE*) (dctx->virtualStart);
     const BYTE* const dictEnd =3D (const BYTE*) (dctx->dictEnd);
-    DEBUGLOG(5, "ZSTD_decompressSequences_bodySplitLitBuffer");
-    (void)frame;
+    DEBUGLOG(5, "ZSTD_decompressSequences_bodySplitLitBuffer (%i seqs)", n=
bSeq);
=20
-    /* Regen sequences */
+    /* Literals are split between internal buffer & output buffer */
     if (nbSeq) {
         seqState_t seqState;
         dctx->fseEntropy =3D 1;
@@ -1357,8 +1437,7 @@ ZSTD_decompressSequences_bodySplitLitBuffer( ZSTD_DCt=
x* dctx,
                 BIT_DStream_completed < BIT_DStream_overflow);
=20
         /* decompress without overrunning litPtr begins */
-        {
-            seq_t sequence =3D ZSTD_decodeSequence(&seqState, isLongOffset=
);
+        {   seq_t sequence =3D {0,0,0};  /* some static analyzer believe t=
hat @sequence is not initialized (it necessarily is, since for(;;) loop as =
at least one iteration) */
             /* Align the decompression loop to 32 + 16 bytes.
                 *
                 * zstd compiled with gcc-9 on an Intel i9-9900k shows 10% =
decompression
@@ -1420,27 +1499,26 @@ ZSTD_decompressSequences_bodySplitLitBuffer( ZSTD_D=
Ctx* dctx,
 #endif
=20
             /* Handle the initial state where litBuffer is currently split=
 between dst and litExtraBuffer */
-            for (; litPtr + sequence.litLength <=3D dctx->litBufferEnd; ) {
-                size_t const oneSeqSize =3D ZSTD_execSequenceSplitLitBuffe=
r(op, oend, litPtr + sequence.litLength - WILDCOPY_OVERLENGTH, sequence, &l=
itPtr, litBufferEnd, prefixStart, vBase, dictEnd);
+            for ( ; nbSeq; nbSeq--) {
+                sequence =3D ZSTD_decodeSequence(&seqState, isLongOffset, =
nbSeq=3D=3D1);
+                if (litPtr + sequence.litLength > dctx->litBufferEnd) brea=
k;
+                {   size_t const oneSeqSize =3D ZSTD_execSequenceSplitLitB=
uffer(op, oend, litPtr + sequence.litLength - WILDCOPY_OVERLENGTH, sequence=
, &litPtr, litBufferEnd, prefixStart, vBase, dictEnd);
 #if defined(FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION) && defined(FUZZING_A=
SSERT_VALID_SEQUENCE)
-                assert(!ZSTD_isError(oneSeqSize));
-                if (frame) ZSTD_assertValidSequence(dctx, op, oend, sequen=
ce, prefixStart, vBase);
+                    assert(!ZSTD_isError(oneSeqSize));
+                    ZSTD_assertValidSequence(dctx, op, oend, sequence, pre=
fixStart, vBase);
 #endif
-                if (UNLIKELY(ZSTD_isError(oneSeqSize)))
-                    return oneSeqSize;
-                DEBUGLOG(6, "regenerated sequence size : %u", (U32)oneSeqS=
ize);
-                op +=3D oneSeqSize;
-                if (UNLIKELY(!--nbSeq))
-                    break;
-                BIT_reloadDStream(&(seqState.DStream));
-                sequence =3D ZSTD_decodeSequence(&seqState, isLongOffset);
-            }
+                    if (UNLIKELY(ZSTD_isError(oneSeqSize)))
+                        return oneSeqSize;
+                    DEBUGLOG(6, "regenerated sequence size : %u", (U32)one=
SeqSize);
+                    op +=3D oneSeqSize;
+            }   }
+            DEBUGLOG(6, "reached: (litPtr + sequence.litLength > dctx->lit=
BufferEnd)");
=20
             /* If there are more sequences, they will need to read literal=
s from litExtraBuffer; copy over the remainder from dst and update litPtr a=
nd litEnd */
             if (nbSeq > 0) {
                 const size_t leftoverLit =3D dctx->litBufferEnd - litPtr;
-                if (leftoverLit)
-                {
+                DEBUGLOG(6, "There are %i sequences left, and %zu/%zu lite=
rals left in buffer", nbSeq, leftoverLit, sequence.litLength);
+                if (leftoverLit) {
                     RETURN_ERROR_IF(leftoverLit > (size_t)(oend - op), dst=
Size_tooSmall, "remaining lit must fit within dstBuffer");
                     ZSTD_safecopyDstBeforeSrc(op, litPtr, leftoverLit);
                     sequence.litLength -=3D leftoverLit;
@@ -1449,24 +1527,22 @@ ZSTD_decompressSequences_bodySplitLitBuffer( ZSTD_D=
Ctx* dctx,
                 litPtr =3D dctx->litExtraBuffer;
                 litBufferEnd =3D dctx->litExtraBuffer + ZSTD_LITBUFFEREXTR=
ASIZE;
                 dctx->litBufferLocation =3D ZSTD_not_in_dst;
-                {
-                    size_t const oneSeqSize =3D ZSTD_execSequence(op, oend=
, sequence, &litPtr, litBufferEnd, prefixStart, vBase, dictEnd);
+                {   size_t const oneSeqSize =3D ZSTD_execSequence(op, oend=
, sequence, &litPtr, litBufferEnd, prefixStart, vBase, dictEnd);
 #if defined(FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION) && defined(FUZZING_A=
SSERT_VALID_SEQUENCE)
                     assert(!ZSTD_isError(oneSeqSize));
-                    if (frame) ZSTD_assertValidSequence(dctx, op, oend, se=
quence, prefixStart, vBase);
+                    ZSTD_assertValidSequence(dctx, op, oend, sequence, pre=
fixStart, vBase);
 #endif
                     if (UNLIKELY(ZSTD_isError(oneSeqSize)))
                         return oneSeqSize;
                     DEBUGLOG(6, "regenerated sequence size : %u", (U32)one=
SeqSize);
                     op +=3D oneSeqSize;
-                    if (--nbSeq)
-                        BIT_reloadDStream(&(seqState.DStream));
                 }
+                nbSeq--;
             }
         }
=20
-        if (nbSeq > 0) /* there is remaining lit from extra buffer */
-        {
+        if (nbSeq > 0) {
+            /* there is remaining lit from extra buffer */
=20
 #if defined(__x86_64__)
             __asm__(".p2align 6");
@@ -1485,35 +1561,34 @@ ZSTD_decompressSequences_bodySplitLitBuffer( ZSTD_D=
Ctx* dctx,
 #  endif
 #endif
=20
-            for (; ; ) {
-                seq_t const sequence =3D ZSTD_decodeSequence(&seqState, is=
LongOffset);
+            for ( ; nbSeq ; nbSeq--) {
+                seq_t const sequence =3D ZSTD_decodeSequence(&seqState, is=
LongOffset, nbSeq=3D=3D1);
                 size_t const oneSeqSize =3D ZSTD_execSequence(op, oend, se=
quence, &litPtr, litBufferEnd, prefixStart, vBase, dictEnd);
 #if defined(FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION) && defined(FUZZING_A=
SSERT_VALID_SEQUENCE)
                 assert(!ZSTD_isError(oneSeqSize));
-                if (frame) ZSTD_assertValidSequence(dctx, op, oend, sequen=
ce, prefixStart, vBase);
+                ZSTD_assertValidSequence(dctx, op, oend, sequence, prefixS=
tart, vBase);
 #endif
                 if (UNLIKELY(ZSTD_isError(oneSeqSize)))
                     return oneSeqSize;
                 DEBUGLOG(6, "regenerated sequence size : %u", (U32)oneSeqS=
ize);
                 op +=3D oneSeqSize;
-                if (UNLIKELY(!--nbSeq))
-                    break;
-                BIT_reloadDStream(&(seqState.DStream));
             }
         }
=20
         /* check if reached exact end */
         DEBUGLOG(5, "ZSTD_decompressSequences_bodySplitLitBuffer: after de=
code loop, remaining nbSeq : %i", nbSeq);
         RETURN_ERROR_IF(nbSeq, corruption_detected, "");
-        RETURN_ERROR_IF(BIT_reloadDStream(&seqState.DStream) < BIT_DStream=
_completed, corruption_detected, "");
+        DEBUGLOG(5, "bitStream : start=3D%p, ptr=3D%p, bitsConsumed=3D%u",=
 seqState.DStream.start, seqState.DStream.ptr, seqState.DStream.bitsConsume=
d);
+        RETURN_ERROR_IF(!BIT_endOfDStream(&seqState.DStream), corruption_d=
etected, "");
         /* save reps for next block */
         { U32 i; for (i=3D0; i<ZSTD_REP_NUM; i++) dctx->entropy.rep[i] =3D=
 (U32)(seqState.prevOffset[i]); }
     }
=20
     /* last literal segment */
-    if (dctx->litBufferLocation =3D=3D ZSTD_split)  /* split hasn't been r=
eached yet, first get dst then copy litExtraBuffer */
-    {
-        size_t const lastLLSize =3D litBufferEnd - litPtr;
+    if (dctx->litBufferLocation =3D=3D ZSTD_split) {
+        /* split hasn't been reached yet, first get dst then copy litExtra=
Buffer */
+        size_t const lastLLSize =3D (size_t)(litBufferEnd - litPtr);
+        DEBUGLOG(6, "copy last literals from segment : %u", (U32)lastLLSiz=
e);
         RETURN_ERROR_IF(lastLLSize > (size_t)(oend - op), dstSize_tooSmall=
, "");
         if (op !=3D NULL) {
             ZSTD_memmove(op, litPtr, lastLLSize);
@@ -1523,15 +1598,17 @@ ZSTD_decompressSequences_bodySplitLitBuffer( ZSTD_D=
Ctx* dctx,
         litBufferEnd =3D dctx->litExtraBuffer + ZSTD_LITBUFFEREXTRASIZE;
         dctx->litBufferLocation =3D ZSTD_not_in_dst;
     }
-    {   size_t const lastLLSize =3D litBufferEnd - litPtr;
+    /* copy last literals from internal buffer */
+    {   size_t const lastLLSize =3D (size_t)(litBufferEnd - litPtr);
+        DEBUGLOG(6, "copy last literals from internal buffer : %u", (U32)l=
astLLSize);
         RETURN_ERROR_IF(lastLLSize > (size_t)(oend-op), dstSize_tooSmall, =
"");
         if (op !=3D NULL) {
             ZSTD_memcpy(op, litPtr, lastLLSize);
             op +=3D lastLLSize;
-        }
-    }
+    }   }
=20
-    return op-ostart;
+    DEBUGLOG(6, "decoded block of size %u bytes", (U32)(op - ostart));
+    return (size_t)(op - ostart);
 }
=20
 FORCE_INLINE_TEMPLATE size_t
@@ -1539,21 +1616,19 @@ DONT_VECTORIZE
 ZSTD_decompressSequences_body(ZSTD_DCtx* dctx,
     void* dst, size_t maxDstSize,
     const void* seqStart, size_t seqSize, int nbSeq,
-    const ZSTD_longOffset_e isLongOffset,
-    const int frame)
+    const ZSTD_longOffset_e isLongOffset)
 {
     const BYTE* ip =3D (const BYTE*)seqStart;
     const BYTE* const iend =3D ip + seqSize;
     BYTE* const ostart =3D (BYTE*)dst;
-    BYTE* const oend =3D dctx->litBufferLocation =3D=3D ZSTD_not_in_dst ? =
ostart + maxDstSize : dctx->litBuffer;
+    BYTE* const oend =3D dctx->litBufferLocation =3D=3D ZSTD_not_in_dst ? =
ZSTD_maybeNullPtrAdd(ostart, maxDstSize) : dctx->litBuffer;
     BYTE* op =3D ostart;
     const BYTE* litPtr =3D dctx->litPtr;
     const BYTE* const litEnd =3D litPtr + dctx->litSize;
     const BYTE* const prefixStart =3D (const BYTE*)(dctx->prefixStart);
     const BYTE* const vBase =3D (const BYTE*)(dctx->virtualStart);
     const BYTE* const dictEnd =3D (const BYTE*)(dctx->dictEnd);
-    DEBUGLOG(5, "ZSTD_decompressSequences_body");
-    (void)frame;
+    DEBUGLOG(5, "ZSTD_decompressSequences_body: nbSeq =3D %d", nbSeq);
=20
     /* Regen sequences */
     if (nbSeq) {
@@ -1568,11 +1643,6 @@ ZSTD_decompressSequences_body(ZSTD_DCtx* dctx,
         ZSTD_initFseState(&seqState.stateML, &seqState.DStream, dctx->MLTp=
tr);
         assert(dst !=3D NULL);
=20
-        ZSTD_STATIC_ASSERT(
-            BIT_DStream_unfinished < BIT_DStream_completed &&
-            BIT_DStream_endOfBuffer < BIT_DStream_completed &&
-            BIT_DStream_completed < BIT_DStream_overflow);
-
 #if defined(__x86_64__)
             __asm__(".p2align 6");
             __asm__("nop");
@@ -1587,73 +1657,70 @@ ZSTD_decompressSequences_body(ZSTD_DCtx* dctx,
 #  endif
 #endif
=20
-        for ( ; ; ) {
-            seq_t const sequence =3D ZSTD_decodeSequence(&seqState, isLong=
Offset);
+        for ( ; nbSeq ; nbSeq--) {
+            seq_t const sequence =3D ZSTD_decodeSequence(&seqState, isLong=
Offset, nbSeq=3D=3D1);
             size_t const oneSeqSize =3D ZSTD_execSequence(op, oend, sequen=
ce, &litPtr, litEnd, prefixStart, vBase, dictEnd);
 #if defined(FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION) && defined(FUZZING_A=
SSERT_VALID_SEQUENCE)
             assert(!ZSTD_isError(oneSeqSize));
-            if (frame) ZSTD_assertValidSequence(dctx, op, oend, sequence, =
prefixStart, vBase);
+            ZSTD_assertValidSequence(dctx, op, oend, sequence, prefixStart=
, vBase);
 #endif
             if (UNLIKELY(ZSTD_isError(oneSeqSize)))
                 return oneSeqSize;
             DEBUGLOG(6, "regenerated sequence size : %u", (U32)oneSeqSize);
             op +=3D oneSeqSize;
-            if (UNLIKELY(!--nbSeq))
-                break;
-            BIT_reloadDStream(&(seqState.DStream));
         }
=20
         /* check if reached exact end */
-        DEBUGLOG(5, "ZSTD_decompressSequences_body: after decode loop, rem=
aining nbSeq : %i", nbSeq);
-        RETURN_ERROR_IF(nbSeq, corruption_detected, "");
-        RETURN_ERROR_IF(BIT_reloadDStream(&seqState.DStream) < BIT_DStream=
_completed, corruption_detected, "");
+        assert(nbSeq =3D=3D 0);
+        RETURN_ERROR_IF(!BIT_endOfDStream(&seqState.DStream), corruption_d=
etected, "");
         /* save reps for next block */
         { U32 i; for (i=3D0; i<ZSTD_REP_NUM; i++) dctx->entropy.rep[i] =3D=
 (U32)(seqState.prevOffset[i]); }
     }
=20
     /* last literal segment */
-    {   size_t const lastLLSize =3D litEnd - litPtr;
+    {   size_t const lastLLSize =3D (size_t)(litEnd - litPtr);
+        DEBUGLOG(6, "copy last literals : %u", (U32)lastLLSize);
         RETURN_ERROR_IF(lastLLSize > (size_t)(oend-op), dstSize_tooSmall, =
"");
         if (op !=3D NULL) {
             ZSTD_memcpy(op, litPtr, lastLLSize);
             op +=3D lastLLSize;
-        }
-    }
+    }   }
=20
-    return op-ostart;
+    DEBUGLOG(6, "decoded block of size %u bytes", (U32)(op - ostart));
+    return (size_t)(op - ostart);
 }
=20
 static size_t
 ZSTD_decompressSequences_default(ZSTD_DCtx* dctx,
                                  void* dst, size_t maxDstSize,
                            const void* seqStart, size_t seqSize, int nbSeq,
-                           const ZSTD_longOffset_e isLongOffset,
-                           const int frame)
+                           const ZSTD_longOffset_e isLongOffset)
 {
-    return ZSTD_decompressSequences_body(dctx, dst, maxDstSize, seqStart, =
seqSize, nbSeq, isLongOffset, frame);
+    return ZSTD_decompressSequences_body(dctx, dst, maxDstSize, seqStart, =
seqSize, nbSeq, isLongOffset);
 }
=20
 static size_t
 ZSTD_decompressSequencesSplitLitBuffer_default(ZSTD_DCtx* dctx,
                                                void* dst, size_t maxDstSiz=
e,
                                          const void* seqStart, size_t seqS=
ize, int nbSeq,
-                                         const ZSTD_longOffset_e isLongOff=
set,
-                                         const int frame)
+                                         const ZSTD_longOffset_e isLongOff=
set)
 {
-    return ZSTD_decompressSequences_bodySplitLitBuffer(dctx, dst, maxDstSi=
ze, seqStart, seqSize, nbSeq, isLongOffset, frame);
+    return ZSTD_decompressSequences_bodySplitLitBuffer(dctx, dst, maxDstSi=
ze, seqStart, seqSize, nbSeq, isLongOffset);
 }
 #endif /* ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG */
=20
 #ifndef ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT
=20
-FORCE_INLINE_TEMPLATE size_t
-ZSTD_prefetchMatch(size_t prefetchPos, seq_t const sequence,
+FORCE_INLINE_TEMPLATE
+
+size_t ZSTD_prefetchMatch(size_t prefetchPos, seq_t const sequence,
                    const BYTE* const prefixStart, const BYTE* const dictEn=
d)
 {
     prefetchPos +=3D sequence.litLength;
     {   const BYTE* const matchBase =3D (sequence.offset > prefetchPos) ? =
dictEnd : prefixStart;
-        const BYTE* const match =3D matchBase + prefetchPos - sequence.off=
set; /* note : this operation can overflow when seq.offset is really too la=
rge, which can only happen when input is corrupted.
-                                                                          =
    * No consequence though : memory address is only used for prefetching, =
not for dereferencing */
+        /* note : this operation can overflow when seq.offset is really to=
o large, which can only happen when input is corrupted.
+         * No consequence though : memory address is only used for prefetc=
hing, not for dereferencing */
+        const BYTE* const match =3D ZSTD_wrappedPtrSub(ZSTD_wrappedPtrAdd(=
matchBase, prefetchPos), sequence.offset);
         PREFETCH_L1(match); PREFETCH_L1(match+CACHELINE_SIZE);   /* note :=
 it's safe to invoke PREFETCH() on any memory address, including invalid on=
es */
     }
     return prefetchPos + sequence.matchLength;
@@ -1668,20 +1735,18 @@ ZSTD_decompressSequencesLong_body(
                                ZSTD_DCtx* dctx,
                                void* dst, size_t maxDstSize,
                          const void* seqStart, size_t seqSize, int nbSeq,
-                         const ZSTD_longOffset_e isLongOffset,
-                         const int frame)
+                         const ZSTD_longOffset_e isLongOffset)
 {
     const BYTE* ip =3D (const BYTE*)seqStart;
     const BYTE* const iend =3D ip + seqSize;
     BYTE* const ostart =3D (BYTE*)dst;
-    BYTE* const oend =3D dctx->litBufferLocation =3D=3D ZSTD_in_dst ? dctx=
->litBuffer : ostart + maxDstSize;
+    BYTE* const oend =3D dctx->litBufferLocation =3D=3D ZSTD_in_dst ? dctx=
->litBuffer : ZSTD_maybeNullPtrAdd(ostart, maxDstSize);
     BYTE* op =3D ostart;
     const BYTE* litPtr =3D dctx->litPtr;
     const BYTE* litBufferEnd =3D dctx->litBufferEnd;
     const BYTE* const prefixStart =3D (const BYTE*) (dctx->prefixStart);
     const BYTE* const dictStart =3D (const BYTE*) (dctx->virtualStart);
     const BYTE* const dictEnd =3D (const BYTE*) (dctx->dictEnd);
-    (void)frame;
=20
     /* Regen sequences */
     if (nbSeq) {
@@ -1706,20 +1771,17 @@ ZSTD_decompressSequencesLong_body(
         ZSTD_initFseState(&seqState.stateML, &seqState.DStream, dctx->MLTp=
tr);
=20
         /* prepare in advance */
-        for (seqNb=3D0; (BIT_reloadDStream(&seqState.DStream) <=3D BIT_DSt=
ream_completed) && (seqNb<seqAdvance); seqNb++) {
-            seq_t const sequence =3D ZSTD_decodeSequence(&seqState, isLong=
Offset);
+        for (seqNb=3D0; seqNb<seqAdvance; seqNb++) {
+            seq_t const sequence =3D ZSTD_decodeSequence(&seqState, isLong=
Offset, seqNb =3D=3D nbSeq-1);
             prefetchPos =3D ZSTD_prefetchMatch(prefetchPos, sequence, pref=
ixStart, dictEnd);
             sequences[seqNb] =3D sequence;
         }
-        RETURN_ERROR_IF(seqNb<seqAdvance, corruption_detected, "");
=20
         /* decompress without stomping litBuffer */
-        for (; (BIT_reloadDStream(&(seqState.DStream)) <=3D BIT_DStream_co=
mpleted) && (seqNb < nbSeq); seqNb++) {
-            seq_t sequence =3D ZSTD_decodeSequence(&seqState, isLongOffset=
);
-            size_t oneSeqSize;
+        for (; seqNb < nbSeq; seqNb++) {
+            seq_t sequence =3D ZSTD_decodeSequence(&seqState, isLongOffset=
, seqNb =3D=3D nbSeq-1);
=20
-            if (dctx->litBufferLocation =3D=3D ZSTD_split && litPtr + sequ=
ences[(seqNb - ADVANCED_SEQS) & STORED_SEQS_MASK].litLength > dctx->litBuff=
erEnd)
-            {
+            if (dctx->litBufferLocation =3D=3D ZSTD_split && litPtr + sequ=
ences[(seqNb - ADVANCED_SEQS) & STORED_SEQS_MASK].litLength > dctx->litBuff=
erEnd) {
                 /* lit buffer is reaching split point, empty out the first=
 buffer and transition to litExtraBuffer */
                 const size_t leftoverLit =3D dctx->litBufferEnd - litPtr;
                 if (leftoverLit)
@@ -1732,26 +1794,26 @@ ZSTD_decompressSequencesLong_body(
                 litPtr =3D dctx->litExtraBuffer;
                 litBufferEnd =3D dctx->litExtraBuffer + ZSTD_LITBUFFEREXTR=
ASIZE;
                 dctx->litBufferLocation =3D ZSTD_not_in_dst;
-                oneSeqSize =3D ZSTD_execSequence(op, oend, sequences[(seqN=
b - ADVANCED_SEQS) & STORED_SEQS_MASK], &litPtr, litBufferEnd, prefixStart,=
 dictStart, dictEnd);
+                {   size_t const oneSeqSize =3D ZSTD_execSequence(op, oend=
, sequences[(seqNb - ADVANCED_SEQS) & STORED_SEQS_MASK], &litPtr, litBuffer=
End, prefixStart, dictStart, dictEnd);
 #if defined(FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION) && defined(FUZZING_A=
SSERT_VALID_SEQUENCE)
-                assert(!ZSTD_isError(oneSeqSize));
-                if (frame) ZSTD_assertValidSequence(dctx, op, oend, sequen=
ces[(seqNb - ADVANCED_SEQS) & STORED_SEQS_MASK], prefixStart, dictStart);
+                    assert(!ZSTD_isError(oneSeqSize));
+                    ZSTD_assertValidSequence(dctx, op, oend, sequences[(se=
qNb - ADVANCED_SEQS) & STORED_SEQS_MASK], prefixStart, dictStart);
 #endif
-                if (ZSTD_isError(oneSeqSize)) return oneSeqSize;
+                    if (ZSTD_isError(oneSeqSize)) return oneSeqSize;
=20
-                prefetchPos =3D ZSTD_prefetchMatch(prefetchPos, sequence, =
prefixStart, dictEnd);
-                sequences[seqNb & STORED_SEQS_MASK] =3D sequence;
-                op +=3D oneSeqSize;
-            }
+                    prefetchPos =3D ZSTD_prefetchMatch(prefetchPos, sequen=
ce, prefixStart, dictEnd);
+                    sequences[seqNb & STORED_SEQS_MASK] =3D sequence;
+                    op +=3D oneSeqSize;
+            }   }
             else
             {
                 /* lit buffer is either wholly contained in first or secon=
d split, or not split at all*/
-                oneSeqSize =3D dctx->litBufferLocation =3D=3D ZSTD_split ?
+                size_t const oneSeqSize =3D dctx->litBufferLocation =3D=3D=
 ZSTD_split ?
                     ZSTD_execSequenceSplitLitBuffer(op, oend, litPtr + seq=
uences[(seqNb - ADVANCED_SEQS) & STORED_SEQS_MASK].litLength - WILDCOPY_OVE=
RLENGTH, sequences[(seqNb - ADVANCED_SEQS) & STORED_SEQS_MASK], &litPtr, li=
tBufferEnd, prefixStart, dictStart, dictEnd) :
                     ZSTD_execSequence(op, oend, sequences[(seqNb - ADVANCE=
D_SEQS) & STORED_SEQS_MASK], &litPtr, litBufferEnd, prefixStart, dictStart,=
 dictEnd);
 #if defined(FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION) && defined(FUZZING_A=
SSERT_VALID_SEQUENCE)
                 assert(!ZSTD_isError(oneSeqSize));
-                if (frame) ZSTD_assertValidSequence(dctx, op, oend, sequen=
ces[(seqNb - ADVANCED_SEQS) & STORED_SEQS_MASK], prefixStart, dictStart);
+                ZSTD_assertValidSequence(dctx, op, oend, sequences[(seqNb =
- ADVANCED_SEQS) & STORED_SEQS_MASK], prefixStart, dictStart);
 #endif
                 if (ZSTD_isError(oneSeqSize)) return oneSeqSize;
=20
@@ -1760,17 +1822,15 @@ ZSTD_decompressSequencesLong_body(
                 op +=3D oneSeqSize;
             }
         }
-        RETURN_ERROR_IF(seqNb<nbSeq, corruption_detected, "");
+        RETURN_ERROR_IF(!BIT_endOfDStream(&seqState.DStream), corruption_d=
etected, "");
=20
         /* finish queue */
         seqNb -=3D seqAdvance;
         for ( ; seqNb<nbSeq ; seqNb++) {
             seq_t *sequence =3D &(sequences[seqNb&STORED_SEQS_MASK]);
-            if (dctx->litBufferLocation =3D=3D ZSTD_split && litPtr + sequ=
ence->litLength > dctx->litBufferEnd)
-            {
+            if (dctx->litBufferLocation =3D=3D ZSTD_split && litPtr + sequ=
ence->litLength > dctx->litBufferEnd) {
                 const size_t leftoverLit =3D dctx->litBufferEnd - litPtr;
-                if (leftoverLit)
-                {
+                if (leftoverLit) {
                     RETURN_ERROR_IF(leftoverLit > (size_t)(oend - op), dst=
Size_tooSmall, "remaining lit must fit within dstBuffer");
                     ZSTD_safecopyDstBeforeSrc(op, litPtr, leftoverLit);
                     sequence->litLength -=3D leftoverLit;
@@ -1779,11 +1839,10 @@ ZSTD_decompressSequencesLong_body(
                 litPtr =3D dctx->litExtraBuffer;
                 litBufferEnd =3D dctx->litExtraBuffer + ZSTD_LITBUFFEREXTR=
ASIZE;
                 dctx->litBufferLocation =3D ZSTD_not_in_dst;
-                {
-                    size_t const oneSeqSize =3D ZSTD_execSequence(op, oend=
, *sequence, &litPtr, litBufferEnd, prefixStart, dictStart, dictEnd);
+                {   size_t const oneSeqSize =3D ZSTD_execSequence(op, oend=
, *sequence, &litPtr, litBufferEnd, prefixStart, dictStart, dictEnd);
 #if defined(FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION) && defined(FUZZING_A=
SSERT_VALID_SEQUENCE)
                     assert(!ZSTD_isError(oneSeqSize));
-                    if (frame) ZSTD_assertValidSequence(dctx, op, oend, se=
quences[seqNb&STORED_SEQS_MASK], prefixStart, dictStart);
+                    ZSTD_assertValidSequence(dctx, op, oend, sequences[seq=
Nb&STORED_SEQS_MASK], prefixStart, dictStart);
 #endif
                     if (ZSTD_isError(oneSeqSize)) return oneSeqSize;
                     op +=3D oneSeqSize;
@@ -1796,7 +1855,7 @@ ZSTD_decompressSequencesLong_body(
                     ZSTD_execSequence(op, oend, *sequence, &litPtr, litBuf=
ferEnd, prefixStart, dictStart, dictEnd);
 #if defined(FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION) && defined(FUZZING_A=
SSERT_VALID_SEQUENCE)
                 assert(!ZSTD_isError(oneSeqSize));
-                if (frame) ZSTD_assertValidSequence(dctx, op, oend, sequen=
ces[seqNb&STORED_SEQS_MASK], prefixStart, dictStart);
+                ZSTD_assertValidSequence(dctx, op, oend, sequences[seqNb&S=
TORED_SEQS_MASK], prefixStart, dictStart);
 #endif
                 if (ZSTD_isError(oneSeqSize)) return oneSeqSize;
                 op +=3D oneSeqSize;
@@ -1808,8 +1867,7 @@ ZSTD_decompressSequencesLong_body(
     }
=20
     /* last literal segment */
-    if (dctx->litBufferLocation =3D=3D ZSTD_split)  /* first deplete liter=
al buffer in dst, then copy litExtraBuffer */
-    {
+    if (dctx->litBufferLocation =3D=3D ZSTD_split) { /* first deplete lite=
ral buffer in dst, then copy litExtraBuffer */
         size_t const lastLLSize =3D litBufferEnd - litPtr;
         RETURN_ERROR_IF(lastLLSize > (size_t)(oend - op), dstSize_tooSmall=
, "");
         if (op !=3D NULL) {
@@ -1827,17 +1885,16 @@ ZSTD_decompressSequencesLong_body(
         }
     }
=20
-    return op-ostart;
+    return (size_t)(op - ostart);
 }
=20
 static size_t
 ZSTD_decompressSequencesLong_default(ZSTD_DCtx* dctx,
                                  void* dst, size_t maxDstSize,
                            const void* seqStart, size_t seqSize, int nbSeq,
-                           const ZSTD_longOffset_e isLongOffset,
-                           const int frame)
+                           const ZSTD_longOffset_e isLongOffset)
 {
-    return ZSTD_decompressSequencesLong_body(dctx, dst, maxDstSize, seqSta=
rt, seqSize, nbSeq, isLongOffset, frame);
+    return ZSTD_decompressSequencesLong_body(dctx, dst, maxDstSize, seqSta=
rt, seqSize, nbSeq, isLongOffset);
 }
 #endif /* ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT */
=20
@@ -1851,20 +1908,18 @@ DONT_VECTORIZE
 ZSTD_decompressSequences_bmi2(ZSTD_DCtx* dctx,
                                  void* dst, size_t maxDstSize,
                            const void* seqStart, size_t seqSize, int nbSeq,
-                           const ZSTD_longOffset_e isLongOffset,
-                           const int frame)
+                           const ZSTD_longOffset_e isLongOffset)
 {
-    return ZSTD_decompressSequences_body(dctx, dst, maxDstSize, seqStart, =
seqSize, nbSeq, isLongOffset, frame);
+    return ZSTD_decompressSequences_body(dctx, dst, maxDstSize, seqStart, =
seqSize, nbSeq, isLongOffset);
 }
 static BMI2_TARGET_ATTRIBUTE size_t
 DONT_VECTORIZE
 ZSTD_decompressSequencesSplitLitBuffer_bmi2(ZSTD_DCtx* dctx,
                                  void* dst, size_t maxDstSize,
                            const void* seqStart, size_t seqSize, int nbSeq,
-                           const ZSTD_longOffset_e isLongOffset,
-                           const int frame)
+                           const ZSTD_longOffset_e isLongOffset)
 {
-    return ZSTD_decompressSequences_bodySplitLitBuffer(dctx, dst, maxDstSi=
ze, seqStart, seqSize, nbSeq, isLongOffset, frame);
+    return ZSTD_decompressSequences_bodySplitLitBuffer(dctx, dst, maxDstSi=
ze, seqStart, seqSize, nbSeq, isLongOffset);
 }
 #endif /* ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG */
=20
@@ -1873,50 +1928,40 @@ static BMI2_TARGET_ATTRIBUTE size_t
 ZSTD_decompressSequencesLong_bmi2(ZSTD_DCtx* dctx,
                                  void* dst, size_t maxDstSize,
                            const void* seqStart, size_t seqSize, int nbSeq,
-                           const ZSTD_longOffset_e isLongOffset,
-                           const int frame)
+                           const ZSTD_longOffset_e isLongOffset)
 {
-    return ZSTD_decompressSequencesLong_body(dctx, dst, maxDstSize, seqSta=
rt, seqSize, nbSeq, isLongOffset, frame);
+    return ZSTD_decompressSequencesLong_body(dctx, dst, maxDstSize, seqSta=
rt, seqSize, nbSeq, isLongOffset);
 }
 #endif /* ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT */
=20
 #endif /* DYNAMIC_BMI2 */
=20
-typedef size_t (*ZSTD_decompressSequences_t)(
-                            ZSTD_DCtx* dctx,
-                            void* dst, size_t maxDstSize,
-                            const void* seqStart, size_t seqSize, int nbSe=
q,
-                            const ZSTD_longOffset_e isLongOffset,
-                            const int frame);
-
 #ifndef ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG
 static size_t
 ZSTD_decompressSequences(ZSTD_DCtx* dctx, void* dst, size_t maxDstSize,
                    const void* seqStart, size_t seqSize, int nbSeq,
-                   const ZSTD_longOffset_e isLongOffset,
-                   const int frame)
+                   const ZSTD_longOffset_e isLongOffset)
 {
     DEBUGLOG(5, "ZSTD_decompressSequences");
 #if DYNAMIC_BMI2
     if (ZSTD_DCtx_get_bmi2(dctx)) {
-        return ZSTD_decompressSequences_bmi2(dctx, dst, maxDstSize, seqSta=
rt, seqSize, nbSeq, isLongOffset, frame);
+        return ZSTD_decompressSequences_bmi2(dctx, dst, maxDstSize, seqSta=
rt, seqSize, nbSeq, isLongOffset);
     }
 #endif
-    return ZSTD_decompressSequences_default(dctx, dst, maxDstSize, seqStar=
t, seqSize, nbSeq, isLongOffset, frame);
+    return ZSTD_decompressSequences_default(dctx, dst, maxDstSize, seqStar=
t, seqSize, nbSeq, isLongOffset);
 }
 static size_t
 ZSTD_decompressSequencesSplitLitBuffer(ZSTD_DCtx* dctx, void* dst, size_t =
maxDstSize,
                                  const void* seqStart, size_t seqSize, int=
 nbSeq,
-                                 const ZSTD_longOffset_e isLongOffset,
-                                 const int frame)
+                                 const ZSTD_longOffset_e isLongOffset)
 {
     DEBUGLOG(5, "ZSTD_decompressSequencesSplitLitBuffer");
 #if DYNAMIC_BMI2
     if (ZSTD_DCtx_get_bmi2(dctx)) {
-        return ZSTD_decompressSequencesSplitLitBuffer_bmi2(dctx, dst, maxD=
stSize, seqStart, seqSize, nbSeq, isLongOffset, frame);
+        return ZSTD_decompressSequencesSplitLitBuffer_bmi2(dctx, dst, maxD=
stSize, seqStart, seqSize, nbSeq, isLongOffset);
     }
 #endif
-    return ZSTD_decompressSequencesSplitLitBuffer_default(dctx, dst, maxDs=
tSize, seqStart, seqSize, nbSeq, isLongOffset, frame);
+    return ZSTD_decompressSequencesSplitLitBuffer_default(dctx, dst, maxDs=
tSize, seqStart, seqSize, nbSeq, isLongOffset);
 }
 #endif /* ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG */
=20
@@ -1931,69 +1976,114 @@ static size_t
 ZSTD_decompressSequencesLong(ZSTD_DCtx* dctx,
                              void* dst, size_t maxDstSize,
                              const void* seqStart, size_t seqSize, int nbS=
eq,
-                             const ZSTD_longOffset_e isLongOffset,
-                             const int frame)
+                             const ZSTD_longOffset_e isLongOffset)
 {
     DEBUGLOG(5, "ZSTD_decompressSequencesLong");
 #if DYNAMIC_BMI2
     if (ZSTD_DCtx_get_bmi2(dctx)) {
-        return ZSTD_decompressSequencesLong_bmi2(dctx, dst, maxDstSize, se=
qStart, seqSize, nbSeq, isLongOffset, frame);
+        return ZSTD_decompressSequencesLong_bmi2(dctx, dst, maxDstSize, se=
qStart, seqSize, nbSeq, isLongOffset);
     }
 #endif
-  return ZSTD_decompressSequencesLong_default(dctx, dst, maxDstSize, seqSt=
art, seqSize, nbSeq, isLongOffset, frame);
+  return ZSTD_decompressSequencesLong_default(dctx, dst, maxDstSize, seqSt=
art, seqSize, nbSeq, isLongOffset);
 }
 #endif /* ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT */
=20
=20
+/*
+ * @returns The total size of the history referenceable by zstd, including
+ * both the prefix and the extDict. At @p op any offset larger than this
+ * is invalid.
+ */
+static size_t ZSTD_totalHistorySize(BYTE* op, BYTE const* virtualStart)
+{
+    return (size_t)(op - virtualStart);
+}
+
+typedef struct {
+    unsigned longOffsetShare;
+    unsigned maxNbAdditionalBits;
+} ZSTD_OffsetInfo;
=20
-#if !defined(ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT) && \
-    !defined(ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG)
-/* ZSTD_getLongOffsetsShare() :
+/* ZSTD_getOffsetInfo() :
  * condition : offTable must be valid
  * @return : "share" of long offsets (arbitrarily defined as > (1<<23))
- *           compared to maximum possible of (1<<OffFSELog) */
-static unsigned
-ZSTD_getLongOffsetsShare(const ZSTD_seqSymbol* offTable)
+ *           compared to maximum possible of (1<<OffFSELog),
+ *           as well as the maximum number additional bits required.
+ */
+static ZSTD_OffsetInfo
+ZSTD_getOffsetInfo(const ZSTD_seqSymbol* offTable, int nbSeq)
 {
-    const void* ptr =3D offTable;
-    U32 const tableLog =3D ((const ZSTD_seqSymbol_header*)ptr)[0].tableLog;
-    const ZSTD_seqSymbol* table =3D offTable + 1;
-    U32 const max =3D 1 << tableLog;
-    U32 u, total =3D 0;
-    DEBUGLOG(5, "ZSTD_getLongOffsetsShare: (tableLog=3D%u)", tableLog);
-
-    assert(max <=3D (1 << OffFSELog));  /* max not too large */
-    for (u=3D0; u<max; u++) {
-        if (table[u].nbAdditionalBits > 22) total +=3D 1;
+    ZSTD_OffsetInfo info =3D {0, 0};
+    /* If nbSeq =3D=3D 0, then the offTable is uninitialized, but we have
+     * no sequences, so both values should be 0.
+     */
+    if (nbSeq !=3D 0) {
+        const void* ptr =3D offTable;
+        U32 const tableLog =3D ((const ZSTD_seqSymbol_header*)ptr)[0].tabl=
eLog;
+        const ZSTD_seqSymbol* table =3D offTable + 1;
+        U32 const max =3D 1 << tableLog;
+        U32 u;
+        DEBUGLOG(5, "ZSTD_getLongOffsetsShare: (tableLog=3D%u)", tableLog);
+
+        assert(max <=3D (1 << OffFSELog));  /* max not too large */
+        for (u=3D0; u<max; u++) {
+            info.maxNbAdditionalBits =3D MAX(info.maxNbAdditionalBits, tab=
le[u].nbAdditionalBits);
+            if (table[u].nbAdditionalBits > 22) info.longOffsetShare +=3D =
1;
+        }
+
+        assert(tableLog <=3D OffFSELog);
+        info.longOffsetShare <<=3D (OffFSELog - tableLog);  /* scale to Of=
fFSELog */
     }
=20
-    assert(tableLog <=3D OffFSELog);
-    total <<=3D (OffFSELog - tableLog);  /* scale to OffFSELog */
+    return info;
+}
=20
-    return total;
+/*
+ * @returns The maximum offset we can decode in one read of our bitstream,=
 without
+ * reloading more bits in the middle of the offset bits read. Any offsets =
larger
+ * than this must use the long offset decoder.
+ */
+static size_t ZSTD_maxShortOffset(void)
+{
+    if (MEM_64bits()) {
+        /* We can decode any offset without reloading bits.
+         * This might change if the max window size grows.
+         */
+        ZSTD_STATIC_ASSERT(ZSTD_WINDOWLOG_MAX <=3D 31);
+        return (size_t)-1;
+    } else {
+        /* The maximum offBase is (1 << (STREAM_ACCUMULATOR_MIN + 1)) - 1.
+         * This offBase would require STREAM_ACCUMULATOR_MIN extra bits.
+         * Then we have to subtract ZSTD_REP_NUM to get the maximum possib=
le offset.
+         */
+        size_t const maxOffbase =3D ((size_t)1 << (STREAM_ACCUMULATOR_MIN =
+ 1)) - 1;
+        size_t const maxOffset =3D maxOffbase - ZSTD_REP_NUM;
+        assert(ZSTD_highbit32((U32)maxOffbase) =3D=3D STREAM_ACCUMULATOR_M=
IN);
+        return maxOffset;
+    }
 }
-#endif
=20
 size_t
 ZSTD_decompressBlock_internal(ZSTD_DCtx* dctx,
                               void* dst, size_t dstCapacity,
-                        const void* src, size_t srcSize, const int frame, =
const streaming_operation streaming)
+                        const void* src, size_t srcSize, const streaming_o=
peration streaming)
 {   /* blockType =3D=3D blockCompressed */
     const BYTE* ip =3D (const BYTE*)src;
-    /* isLongOffset must be true if there are long offsets.
-     * Offsets are long if they are larger than 2^STREAM_ACCUMULATOR_MIN.
-     * We don't expect that to be the case in 64-bit mode.
-     * In block mode, window size is not known, so we have to be conservat=
ive.
-     * (note: but it could be evaluated from current-lowLimit)
-     */
-    ZSTD_longOffset_e const isLongOffset =3D (ZSTD_longOffset_e)(MEM_32bit=
s() && (!frame || (dctx->fParams.windowSize > (1ULL << STREAM_ACCUMULATOR_M=
IN))));
-    DEBUGLOG(5, "ZSTD_decompressBlock_internal (size : %u)", (U32)srcSize);
-
-    RETURN_ERROR_IF(srcSize >=3D ZSTD_BLOCKSIZE_MAX, srcSize_wrong, "");
+    DEBUGLOG(5, "ZSTD_decompressBlock_internal (cSize : %u)", (unsigned)sr=
cSize);
+
+    /* Note : the wording of the specification
+     * allows compressed block to be sized exactly ZSTD_blockSizeMax(dctx).
+     * This generally does not happen, as it makes little sense,
+     * since an uncompressed block would feature same size and have no dec=
ompression cost.
+     * Also, note that decoder from reference libzstd before < v1.5.4
+     * would consider this edge case as an error.
+     * As a consequence, avoid generating compressed blocks of size ZSTD_b=
lockSizeMax(dctx)
+     * for broader compatibility with the deployed ecosystem of zstd decod=
ers */
+    RETURN_ERROR_IF(srcSize > ZSTD_blockSizeMax(dctx), srcSize_wrong, "");
=20
     /* Decode literals section */
     {   size_t const litCSize =3D ZSTD_decodeLiteralsBlock(dctx, src, srcS=
ize, dst, dstCapacity, streaming);
-        DEBUGLOG(5, "ZSTD_decodeLiteralsBlock : %u", (U32)litCSize);
+        DEBUGLOG(5, "ZSTD_decodeLiteralsBlock : cSize=3D%u, nbLiterals=3D%=
zu", (U32)litCSize, dctx->litSize);
         if (ZSTD_isError(litCSize)) return litCSize;
         ip +=3D litCSize;
         srcSize -=3D litCSize;
@@ -2001,6 +2091,23 @@ ZSTD_decompressBlock_internal(ZSTD_DCtx* dctx,
=20
     /* Build Decoding Tables */
     {
+        /* Compute the maximum block size, which must also work when !fram=
e and fParams are unset.
+         * Additionally, take the min with dstCapacity to ensure that the =
totalHistorySize fits in a size_t.
+         */
+        size_t const blockSizeMax =3D MIN(dstCapacity, ZSTD_blockSizeMax(d=
ctx));
+        size_t const totalHistorySize =3D ZSTD_totalHistorySize(ZSTD_maybe=
NullPtrAdd((BYTE*)dst, blockSizeMax), (BYTE const*)dctx->virtualStart);
+        /* isLongOffset must be true if there are long offsets.
+         * Offsets are long if they are larger than ZSTD_maxShortOffset().
+         * We don't expect that to be the case in 64-bit mode.
+         *
+         * We check here to see if our history is large enough to allow lo=
ng offsets.
+         * If it isn't, then we can't possible have (valid) long offsets. =
If the offset
+         * is invalid, then it is okay to read it incorrectly.
+         *
+         * If isLongOffsets is true, then we will later check our decoding=
 table to see
+         * if it is even possible to generate long offsets.
+         */
+        ZSTD_longOffset_e isLongOffset =3D (ZSTD_longOffset_e)(MEM_32bits(=
) && (totalHistorySize > ZSTD_maxShortOffset()));
         /* These macros control at build-time which decompressor implement=
ation
          * we use. If neither is defined, we do some inspection and dispat=
ch at
          * runtime.
@@ -2008,6 +2115,11 @@ ZSTD_decompressBlock_internal(ZSTD_DCtx* dctx,
 #if !defined(ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT) && \
     !defined(ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG)
         int usePrefetchDecoder =3D dctx->ddictIsCold;
+#else
+        /* Set to 1 to avoid computing offset info if we don't need to.
+         * Otherwise this value is ignored.
+         */
+        int usePrefetchDecoder =3D 1;
 #endif
         int nbSeq;
         size_t const seqHSize =3D ZSTD_decodeSeqHeaders(dctx, &nbSeq, ip, =
srcSize);
@@ -2015,40 +2127,55 @@ ZSTD_decompressBlock_internal(ZSTD_DCtx* dctx,
         ip +=3D seqHSize;
         srcSize -=3D seqHSize;
=20
-        RETURN_ERROR_IF(dst =3D=3D NULL && nbSeq > 0, dstSize_tooSmall, "N=
ULL not handled");
+        RETURN_ERROR_IF((dst =3D=3D NULL || dstCapacity =3D=3D 0) && nbSeq=
 > 0, dstSize_tooSmall, "NULL not handled");
+        RETURN_ERROR_IF(MEM_64bits() && sizeof(size_t) =3D=3D sizeof(void*=
) && (size_t)(-1) - (size_t)dst < (size_t)(1 << 20), dstSize_tooSmall,
+                "invalid dst");
=20
-#if !defined(ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT) && \
-    !defined(ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG)
-        if ( !usePrefetchDecoder
-          && (!frame || (dctx->fParams.windowSize > (1<<24)))
-          && (nbSeq>ADVANCED_SEQS) ) {  /* could probably use a larger nbS=
eq limit */
-            U32 const shareLongOffsets =3D ZSTD_getLongOffsetsShare(dctx->=
OFTptr);
-            U32 const minShare =3D MEM_64bits() ? 7 : 20; /* heuristic val=
ues, correspond to 2.73% and 7.81% */
-            usePrefetchDecoder =3D (shareLongOffsets >=3D minShare);
+        /* If we could potentially have long offsets, or we might want to =
use the prefetch decoder,
+         * compute information about the share of long offsets, and the ma=
ximum nbAdditionalBits.
+         * NOTE: could probably use a larger nbSeq limit
+         */
+        if (isLongOffset || (!usePrefetchDecoder && (totalHistorySize > (1=
u << 24)) && (nbSeq > 8))) {
+            ZSTD_OffsetInfo const info =3D ZSTD_getOffsetInfo(dctx->OFTptr=
, nbSeq);
+            if (isLongOffset && info.maxNbAdditionalBits <=3D STREAM_ACCUM=
ULATOR_MIN) {
+                /* If isLongOffset, but the maximum number of additional b=
its that we see in our table is small
+                 * enough, then we know it is impossible to have too long =
an offset in this block, so we can
+                 * use the regular offset decoder.
+                 */
+                isLongOffset =3D ZSTD_lo_isRegularOffset;
+            }
+            if (!usePrefetchDecoder) {
+                U32 const minShare =3D MEM_64bits() ? 7 : 20; /* heuristic=
 values, correspond to 2.73% and 7.81% */
+                usePrefetchDecoder =3D (info.longOffsetShare >=3D minShare=
);
+            }
         }
-#endif
=20
         dctx->ddictIsCold =3D 0;
=20
 #if !defined(ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT) && \
     !defined(ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG)
-        if (usePrefetchDecoder)
+        if (usePrefetchDecoder) {
+#else
+        (void)usePrefetchDecoder;
+        {
 #endif
 #ifndef ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT
-            return ZSTD_decompressSequencesLong(dctx, dst, dstCapacity, ip=
, srcSize, nbSeq, isLongOffset, frame);
+            return ZSTD_decompressSequencesLong(dctx, dst, dstCapacity, ip=
, srcSize, nbSeq, isLongOffset);
 #endif
+        }
=20
 #ifndef ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG
         /* else */
         if (dctx->litBufferLocation =3D=3D ZSTD_split)
-            return ZSTD_decompressSequencesSplitLitBuffer(dctx, dst, dstCa=
pacity, ip, srcSize, nbSeq, isLongOffset, frame);
+            return ZSTD_decompressSequencesSplitLitBuffer(dctx, dst, dstCa=
pacity, ip, srcSize, nbSeq, isLongOffset);
         else
-            return ZSTD_decompressSequences(dctx, dst, dstCapacity, ip, sr=
cSize, nbSeq, isLongOffset, frame);
+            return ZSTD_decompressSequences(dctx, dst, dstCapacity, ip, sr=
cSize, nbSeq, isLongOffset);
 #endif
     }
 }
=20
=20
+ZSTD_ALLOW_POINTER_OVERFLOW_ATTR
 void ZSTD_checkContinuity(ZSTD_DCtx* dctx, const void* dst, size_t dstSize)
 {
     if (dst !=3D dctx->previousDstEnd && dstSize > 0) {   /* not contiguou=
s */
@@ -2060,13 +2187,24 @@ void ZSTD_checkContinuity(ZSTD_DCtx* dctx, const vo=
id* dst, size_t dstSize)
 }
=20
=20
-size_t ZSTD_decompressBlock(ZSTD_DCtx* dctx,
-                            void* dst, size_t dstCapacity,
-                      const void* src, size_t srcSize)
+size_t ZSTD_decompressBlock_deprecated(ZSTD_DCtx* dctx,
+                                       void* dst, size_t dstCapacity,
+                                 const void* src, size_t srcSize)
 {
     size_t dSize;
+    dctx->isFrameDecompression =3D 0;
     ZSTD_checkContinuity(dctx, dst, dstCapacity);
-    dSize =3D ZSTD_decompressBlock_internal(dctx, dst, dstCapacity, src, s=
rcSize, /* frame */ 0, not_streaming);
+    dSize =3D ZSTD_decompressBlock_internal(dctx, dst, dstCapacity, src, s=
rcSize, not_streaming);
+    FORWARD_IF_ERROR(dSize, "");
     dctx->previousDstEnd =3D (char*)dst + dSize;
     return dSize;
 }
+
+
+/* NOTE: Must just wrap ZSTD_decompressBlock_deprecated() */
+size_t ZSTD_decompressBlock(ZSTD_DCtx* dctx,
+                            void* dst, size_t dstCapacity,
+                      const void* src, size_t srcSize)
+{
+    return ZSTD_decompressBlock_deprecated(dctx, dst, dstCapacity, src, sr=
cSize);
+}
diff --git a/lib/zstd/decompress/zstd_decompress_block.h b/lib/zstd/decompr=
ess/zstd_decompress_block.h
index 3d2d57a5d25a..becffbd89364 100644
--- a/lib/zstd/decompress/zstd_decompress_block.h
+++ b/lib/zstd/decompress/zstd_decompress_block.h
@@ -1,5 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -47,7 +48,7 @@ typedef enum {
  */
 size_t ZSTD_decompressBlock_internal(ZSTD_DCtx* dctx,
                                void* dst, size_t dstCapacity,
-                         const void* src, size_t srcSize, const int frame,=
 const streaming_operation streaming);
+                         const void* src, size_t srcSize, const streaming_=
operation streaming);
=20
 /* ZSTD_buildFSETable() :
  * generate FSE decoding table for one symbol (ll, ml or off)
@@ -64,5 +65,10 @@ void ZSTD_buildFSETable(ZSTD_seqSymbol* dt,
                    unsigned tableLog, void* wksp, size_t wkspSize,
                    int bmi2);
=20
+/* Internal definition of ZSTD_decompressBlock() to avoid deprecation warn=
ings. */
+size_t ZSTD_decompressBlock_deprecated(ZSTD_DCtx* dctx,
+                            void* dst, size_t dstCapacity,
+                      const void* src, size_t srcSize);
+
=20
 #endif /* ZSTD_DEC_BLOCK_H */
diff --git a/lib/zstd/decompress/zstd_decompress_internal.h b/lib/zstd/deco=
mpress/zstd_decompress_internal.h
index 98102edb6a83..2a225d1811c4 100644
--- a/lib/zstd/decompress/zstd_decompress_internal.h
+++ b/lib/zstd/decompress/zstd_decompress_internal.h
@@ -1,5 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
 /*
- * Copyright (c) Yann Collet, Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -75,12 +76,13 @@ static UNUSED_ATTR const U32 ML_base[MaxML+1] =3D {
=20
 #define ZSTD_BUILD_FSE_TABLE_WKSP_SIZE (sizeof(S16) * (MaxSeq + 1) + (1u <=
< MaxFSELog) + sizeof(U64))
 #define ZSTD_BUILD_FSE_TABLE_WKSP_SIZE_U32 ((ZSTD_BUILD_FSE_TABLE_WKSP_SIZ=
E + sizeof(U32) - 1) / sizeof(U32))
+#define ZSTD_HUFFDTABLE_CAPACITY_LOG 12
=20
 typedef struct {
     ZSTD_seqSymbol LLTable[SEQSYMBOL_TABLE_SIZE(LLFSELog)];    /* Note : S=
pace reserved for FSE Tables */
     ZSTD_seqSymbol OFTable[SEQSYMBOL_TABLE_SIZE(OffFSELog)];   /* is also =
used as temporary workspace while building hufTable during DDict creation */
     ZSTD_seqSymbol MLTable[SEQSYMBOL_TABLE_SIZE(MLFSELog)];    /* and ther=
efore must be at least HUF_DECOMPRESS_WORKSPACE_SIZE large */
-    HUF_DTable hufTable[HUF_DTABLE_SIZE(HufLog)];  /* can accommodate HUF_=
decompress4X */
+    HUF_DTable hufTable[HUF_DTABLE_SIZE(ZSTD_HUFFDTABLE_CAPACITY_LOG)];  /=
* can accommodate HUF_decompress4X */
     U32 rep[ZSTD_REP_NUM];
     U32 workspace[ZSTD_BUILD_FSE_TABLE_WKSP_SIZE_U32];
 } ZSTD_entropyDTables_t;
@@ -135,7 +137,7 @@ struct ZSTD_DCtx_s
     const void* virtualStart;     /* virtual start of previous segment if =
it was just before current one */
     const void* dictEnd;          /* end of previous segment */
     size_t expected;
-    ZSTD_frameHeader fParams;
+    ZSTD_FrameHeader fParams;
     U64 processedCSize;
     U64 decodedSize;
     blockType_e bType;            /* used in ZSTD_decompressContinue(), st=
ore blockType between block header decoding and block decompression stages =
*/
@@ -152,7 +154,8 @@ struct ZSTD_DCtx_s
     size_t litSize;
     size_t rleSize;
     size_t staticSize;
-#if DYNAMIC_BMI2 !=3D 0
+    int isFrameDecompression;
+#if DYNAMIC_BMI2
     int bmi2;                     /* =3D=3D 1 if the CPU supports BMI2 and=
 0 otherwise. CPU support is determined dynamically once per context lifeti=
me. */
 #endif
=20
@@ -164,6 +167,8 @@ struct ZSTD_DCtx_s
     ZSTD_dictUses_e dictUses;
     ZSTD_DDictHashSet* ddictSet;                    /* Hash set for multip=
le ddicts */
     ZSTD_refMultipleDDicts_e refMultipleDDicts;     /* User specified: if =
=3D=3D 1, will allow references to multiple DDicts. Default =3D=3D 0 (disab=
led) */
+    int disableHufAsm;
+    int maxBlockSizeParam;
=20
     /* streaming */
     ZSTD_dStreamStage streamStage;
@@ -199,11 +204,11 @@ struct ZSTD_DCtx_s
 };  /* typedef'd to ZSTD_DCtx within "zstd.h" */
=20
 MEM_STATIC int ZSTD_DCtx_get_bmi2(const struct ZSTD_DCtx_s *dctx) {
-#if DYNAMIC_BMI2 !=3D 0
-	return dctx->bmi2;
+#if DYNAMIC_BMI2
+    return dctx->bmi2;
 #else
     (void)dctx;
-	return 0;
+    return 0;
 #endif
 }
=20
diff --git a/lib/zstd/decompress_sources.h b/lib/zstd/decompress_sources.h
index a06ca187aab5..8a47eb2a4514 100644
--- a/lib/zstd/decompress_sources.h
+++ b/lib/zstd/decompress_sources.h
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */
 /*
- * Copyright (c) Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
diff --git a/lib/zstd/zstd_common_module.c b/lib/zstd/zstd_common_module.c
index 22686e367e6f..466828e35752 100644
--- a/lib/zstd/zstd_common_module.c
+++ b/lib/zstd/zstd_common_module.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
 /*
- * Copyright (c) Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -24,9 +24,6 @@ EXPORT_SYMBOL_GPL(HUF_readStats_wksp);
 EXPORT_SYMBOL_GPL(ZSTD_isError);
 EXPORT_SYMBOL_GPL(ZSTD_getErrorName);
 EXPORT_SYMBOL_GPL(ZSTD_getErrorCode);
-EXPORT_SYMBOL_GPL(ZSTD_customMalloc);
-EXPORT_SYMBOL_GPL(ZSTD_customCalloc);
-EXPORT_SYMBOL_GPL(ZSTD_customFree);
=20
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_DESCRIPTION("Zstd Common");
diff --git a/lib/zstd/zstd_compress_module.c b/lib/zstd/zstd_compress_modul=
e.c
index bd8784449b31..7651b53551c8 100644
--- a/lib/zstd/zstd_compress_module.c
+++ b/lib/zstd/zstd_compress_module.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
 /*
- * Copyright (c) Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -16,6 +16,7 @@
=20
 #include "common/zstd_deps.h"
 #include "common/zstd_internal.h"
+#include "compress/zstd_compress_internal.h"
=20
 #define ZSTD_FORWARD_IF_ERR(ret)            \
 	do {                                \
@@ -92,12 +93,64 @@ zstd_compression_parameters zstd_get_cparams(int level,
 }
 EXPORT_SYMBOL(zstd_get_cparams);
=20
+size_t zstd_cctx_set_param(zstd_cctx *cctx, ZSTD_cParameter param, int val=
ue)
+{
+	return ZSTD_CCtx_setParameter(cctx, param, value);
+}
+EXPORT_SYMBOL(zstd_cctx_set_param);
+
 size_t zstd_cctx_workspace_bound(const zstd_compression_parameters *cparam=
s)
 {
 	return ZSTD_estimateCCtxSize_usingCParams(*cparams);
 }
 EXPORT_SYMBOL(zstd_cctx_workspace_bound);
=20
+// Used by zstd_cctx_workspace_bound_with_ext_seq_prod()
+static size_t dummy_external_sequence_producer(
+	void *sequenceProducerState,
+	ZSTD_Sequence *outSeqs, size_t outSeqsCapacity,
+	const void *src, size_t srcSize,
+	const void *dict, size_t dictSize,
+	int compressionLevel,
+	size_t windowSize)
+{
+	(void)sequenceProducerState;
+	(void)outSeqs; (void)outSeqsCapacity;
+	(void)src; (void)srcSize;
+	(void)dict; (void)dictSize;
+	(void)compressionLevel;
+	(void)windowSize;
+	return ZSTD_SEQUENCE_PRODUCER_ERROR;
+}
+
+static void init_cctx_params_from_compress_params(
+	ZSTD_CCtx_params *cctx_params,
+	const zstd_compression_parameters *compress_params)
+{
+	ZSTD_parameters zstd_params;
+	memset(&zstd_params, 0, sizeof(zstd_params));
+	zstd_params.cParams =3D *compress_params;
+	ZSTD_CCtxParams_init_advanced(cctx_params, zstd_params);
+}
+
+size_t zstd_cctx_workspace_bound_with_ext_seq_prod(const zstd_compression_=
parameters *compress_params)
+{
+	ZSTD_CCtx_params cctx_params;
+	init_cctx_params_from_compress_params(&cctx_params, compress_params);
+	ZSTD_CCtxParams_registerSequenceProducer(&cctx_params, NULL, dummy_extern=
al_sequence_producer);
+	return ZSTD_estimateCCtxSize_usingCCtxParams(&cctx_params);
+}
+EXPORT_SYMBOL(zstd_cctx_workspace_bound_with_ext_seq_prod);
+
+size_t zstd_cstream_workspace_bound_with_ext_seq_prod(const zstd_compressi=
on_parameters *compress_params)
+{
+	ZSTD_CCtx_params cctx_params;
+	init_cctx_params_from_compress_params(&cctx_params, compress_params);
+	ZSTD_CCtxParams_registerSequenceProducer(&cctx_params, NULL, dummy_extern=
al_sequence_producer);
+	return ZSTD_estimateCStreamSize_usingCCtxParams(&cctx_params);
+}
+EXPORT_SYMBOL(zstd_cstream_workspace_bound_with_ext_seq_prod);
+
 zstd_cctx *zstd_init_cctx(void *workspace, size_t workspace_size)
 {
 	if (workspace =3D=3D NULL)
@@ -209,5 +262,25 @@ size_t zstd_end_stream(zstd_cstream *cstream, zstd_out=
_buffer *output)
 }
 EXPORT_SYMBOL(zstd_end_stream);
=20
+void zstd_register_sequence_producer(
+  zstd_cctx *cctx,
+  void* sequence_producer_state,
+  zstd_sequence_producer_f sequence_producer
+) {
+	ZSTD_registerSequenceProducer(cctx, sequence_producer_state, sequence_pro=
ducer);
+}
+EXPORT_SYMBOL(zstd_register_sequence_producer);
+
+size_t zstd_compress_sequences_and_literals(zstd_cctx *cctx, void* dst, si=
ze_t dst_capacity,
+					    const zstd_sequence *in_seqs, size_t in_seqs_size,
+					    const void* literals, size_t lit_size, size_t lit_capacity,
+					    size_t decompressed_size)
+{
+	return ZSTD_compressSequencesAndLiterals(cctx, dst, dst_capacity, in_seqs,
+						 in_seqs_size, literals, lit_size,
+						 lit_capacity, decompressed_size);
+}
+EXPORT_SYMBOL(zstd_compress_sequences_and_literals);
+
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_DESCRIPTION("Zstd Compressor");
diff --git a/lib/zstd/zstd_decompress_module.c b/lib/zstd/zstd_decompress_m=
odule.c
index 469fc3059be0..0ae819f0c927 100644
--- a/lib/zstd/zstd_decompress_module.c
+++ b/lib/zstd/zstd_decompress_module.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
 /*
- * Copyright (c) Facebook, Inc.
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
  * All rights reserved.
  *
  * This source code is licensed under both the BSD-style license (found in=
 the
@@ -113,7 +113,7 @@ EXPORT_SYMBOL(zstd_init_dstream);
=20
 size_t zstd_reset_dstream(zstd_dstream *dstream)
 {
-	return ZSTD_resetDStream(dstream);
+	return ZSTD_DCtx_reset(dstream, ZSTD_reset_session_only);
 }
 EXPORT_SYMBOL(zstd_reset_dstream);
=20
--=20
2.48.1