From nobody Wed Dec 17 22:45:45 2025 Received: from mail-qk1-f176.google.com (mail-qk1-f176.google.com [209.85.222.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E591045948 for ; Thu, 13 Mar 2025 20:45:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741898757; cv=none; b=JxdASkiKDLwXePFEiwLDHWi0qn0HNVLHveeRIcf1jeACIp/9rQwsktBecjRwgK69hhWbMZ0l+tKfqRu7k2olhQAF+INMtL9ph8gfS2GulUXeN7QLM4nIExdCRpP2dD2W4DNN7L2wQGZDnreTJZBXpnD+dmN4UHPqjtQ2yYYaXfI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741898757; c=relaxed/simple; bh=VleBbQSVvzfecY5Z0zRYCc/lxNaqdlR3BU0V4Kwu3Yg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=HA6nkAdblcA5RxjRkzP/IT9MOI67aXd0d3nehrZHIkWtkwcJVQtGXV+kn0fADVrEimJ0mhHbKSFWDQSL3cNLRmmsiKWcWm+d+wMnp8qfSBA7VIoMGzGu7HD4jMqQDwnbbHp7xP5aURbrLuTlx08L/Jg/OHtuJSadwEZwF8H2DIE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Qsx0VOy6; arc=none smtp.client-ip=209.85.222.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Qsx0VOy6" Received: by mail-qk1-f176.google.com with SMTP id af79cd13be357-7c0818add57so159926485a.3 for ; Thu, 13 Mar 2025 13:45:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741898720; x=1742503520; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=i99pn8RCKckV8A3EJndGrowMmmSHYimKx6PScVavN30=; b=Qsx0VOy6Jqm6IN81/XszFhzeOvTFEF+SW1HV/2uHf5I9cWtDCGtMTmwhwdxENMCDH7 x0irQ66IfPJ9H4++55RdxWvtRqjAYvAHfdKX3SNGeBcBMURsRQOPgjyN9Hco0UDsRoKG hiZ1TdjwEeIS/UDO6KSZIo8X3OPJk+OgPro4pRmfFNhIsTet4VwbQhW4Hr6/argFZ6NQ UPrZ49xiUGze+qAcuPL5ulO+F7pYqgwD8WLsFIqXWeMMZZ1GupjjwdTo55RWDhor95Ou lAYVmMD6VTYUl4zmTHdQ77BV9lUJiY/xRGMi31OIMWzNtGw3FXdZ8sCMX431sFbJXkLr 1LUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741898720; x=1742503520; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=i99pn8RCKckV8A3EJndGrowMmmSHYimKx6PScVavN30=; b=NoiIpttGahN2SsfT07N+aJeSHK3ErBSWnY5uLLs2ve6Z0VRREDHjJAn78CgLMAnxNn MyTBuu7B4BGIHdpELesPOJ5qofsChv/RGb6g5a9m2scca4ERjQFhxLdqIUH6ugbyHM3N LVQ6fPMahnFlrJR8CL2DUAfj1ux5xRM/UwIGHPI0D7X6S2Ywu209KqXcVzcJjKN8Z3Qq +Xgsb/1mVaRztHqEQTUgVf+7hFJtMBn24WQ1iq0UA731Ef3TS3g8nHrb+HFdGXUwxGOg m+BOby+On6s3/j0/cPh5n85ae6o7/vkWCJsDrSWJ8pnI1R/pKfo9PBpShSIirSiYVgAj cETA== X-Gm-Message-State: AOJu0YzSSfeVvn72808f3RLXlEmuB7IEN9wF9TyYkB97ubc5aVKpbm52 JdKaqRRZJuu5B++3k3JeENGSlgQ977ccNAUdr7+JzBCkmOkQ3Pol4UfzHA== X-Gm-Gg: ASbGncukvjRvfs3P8fyatQ5/SZTIcms4kVI8ywq2OTcG0PKnKJz+0iFMhhgkn5CJLFU KRG+JlhxEHAVbEp2l6DmLY5RSj6f1vdT+nybmBV1x6Hp/9NNaAs/yTgtIYO6bbiPFCnGQBxQk5y JSWqOqSrIPuCPJPy1ydHnfWMhK5FwalFBVYt016qQJNTqvZPcI5XhcGFLOEKwxu6KwLfmtdO207 AVu1nsZHBg8MPjwy986E/OdhlkzvycbzdmYTzMk0t2wZWdUiYcJ++XHIyTCz8PwqKAKZcUb9QJg 7q5hyPp/VKipzdOl6QFYplzXS9Mg+pk5iYw+c4oqTHYKTkCth1x4+XPx/EAlNJYYhOXOKbQRxHm PDEtVC1ioDlVetaHiVpW2OdklB2kPvPk7ag== X-Google-Smtp-Source: AGHT+IErBbsOg94qtoKSvaxU2QZSrLmRvcY/UF2QU8MXZyWIHKnqDNVwSBmiqwEXoS5L9o4jqQA6hA== X-Received: by 2002:a05:620a:6008:b0:7c5:444e:3f56 with SMTP id af79cd13be357-7c55e843b8amr2176246085a.1.1741898718162; Thu, 13 Mar 2025 13:45:18 -0700 (PDT) Received: from nickserv.taila7d40.ts.net (pool-173-52-224-122.nycmny.fios.verizon.net. [173.52.224.122]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7c573c714cbsm141348585a.30.2025.03.13.13.45.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Mar 2025 13:45:17 -0700 (PDT) From: Nick Terrell To: linux-kernel@vger.kernel.org Cc: Nick Terrell , Nick Terrell , Kernel Team , David Sterba Subject: [PATCH 1/1] zstd: Import upstream v1.5.7 Date: Thu, 13 Mar 2025 13:59:21 -0700 Message-ID: <20250313205923.4105088-2-nickrterrell@gmail.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250313205923.4105088-1-nickrterrell@gmail.com> References: <20250313205923.4105088-1-nickrterrell@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Nick Terrell In addition to keeping the kernel's copy of zstd up to date, this update was requested by Intel to expose upstream's APIs that allow QAT to accelera= te the LZ match finding stage of Zstd. This patch is imported from the upstream tag v1.5.7-kernel [0], which is si= gned with upstream's signing key EF8FE99528B52FFD [1]. It was imported from upst= ream using this command: export ZSTD=3D/path/to/repo/zstd/ export LINUX=3D/path/to/repo/linux/ cd "$ZSTD/contrib/linux-kernel" git checkout v1.5.7-kernel make import LINUX=3D"$LINUX" This patch has been tested on x86-64, and has been boot tested with a zstd compressed kernel & initramfs on i386 and aarch64. I benchmarked the patch on x86-64 with gcc-14.2.1 on an Intel i9-9900K by measruing the performance of compressed filesystem reads and writes. Component, Level, Size delta, C. time delta, D. time delta Btrfs , 1, +0.00%, -6.1%, +1.4% Btrfs , 3, +0.00%, -9.8%, +3.0% Btrfs , 5, +0.00%, +1.7%, +1.4% Btrfs , 7, +0.00%, -1.9%, +2.7% Btrfs , 9, +0.00%, -3.4%, +3.7% Btrfs , 15, +0.00%, -0.3%, +3.6% SquashFS , 1, +0.00%, N/A, +1.9% The major changes that impact the kernel use cases for each version are: v1.5.7: https://github.com/facebook/zstd/releases/tag/v1.5.7 * Add zstd_compress_sequences_and_literals() for use by Intel's QAT driver to implement Zstd compression acceleration in the kernel. * Fix an underflow bug in 32-bit builds that can cause data corruption when processing more than 4GB of data with a single `ZSTD_CCtx` object, when an input crosses the 4GB boundry. I don't believe this impacts any current k= ernel use cases, because the `ZSTD_CCtx` is typically reconstructed between compressions. * Levels 1-4 see 5-10% compression speed improvements for inputs smaller th= an 128KB. v1.5.6: https://github.com/facebook/zstd/releases/tag/v1.5.6 * Improved compression ratio for the highest compression levels. I don't ex= pect these see much use however, due to their slow speeds. v1.5.5: https://github.com/facebook/zstd/releases/tag/v1.5.5 * Fix a rare corruption bug that can trigger on levels 13 and above. * Improve compression speed of levels 5-11 on incompressible data. v1.5.4: https://github.com/facebook/zstd/releases/tag/v1.5.4 * Improve copmression speed of levels 5-11 on ARM. * Improve dictionary compression speed. Signed-off-by: Nick Terrell --- include/linux/zstd.h | 87 +- include/linux/zstd_errors.h | 30 +- include/linux/zstd_lib.h | 1123 ++++-- lib/zstd/Makefile | 3 +- lib/zstd/common/allocations.h | 56 + lib/zstd/common/bits.h | 150 + lib/zstd/common/bitstream.h | 155 +- lib/zstd/common/compiler.h | 151 +- lib/zstd/common/cpu.h | 3 +- lib/zstd/common/debug.c | 9 +- lib/zstd/common/debug.h | 37 +- lib/zstd/common/entropy_common.c | 42 +- lib/zstd/common/error_private.c | 13 +- lib/zstd/common/error_private.h | 88 +- lib/zstd/common/fse.h | 103 +- lib/zstd/common/fse_decompress.c | 132 +- lib/zstd/common/huf.h | 240 +- lib/zstd/common/mem.h | 3 +- lib/zstd/common/portability_macros.h | 45 +- lib/zstd/common/zstd_common.c | 38 +- lib/zstd/common/zstd_deps.h | 16 +- lib/zstd/common/zstd_internal.h | 153 +- lib/zstd/compress/clevels.h | 3 +- lib/zstd/compress/fse_compress.c | 74 +- lib/zstd/compress/hist.c | 13 +- lib/zstd/compress/hist.h | 10 +- lib/zstd/compress/huf_compress.c | 441 ++- lib/zstd/compress/zstd_compress.c | 3293 ++++++++++++----- lib/zstd/compress/zstd_compress_internal.h | 621 +++- lib/zstd/compress/zstd_compress_literals.c | 157 +- lib/zstd/compress/zstd_compress_literals.h | 25 +- lib/zstd/compress/zstd_compress_sequences.c | 21 +- lib/zstd/compress/zstd_compress_sequences.h | 16 +- lib/zstd/compress/zstd_compress_superblock.c | 394 +- lib/zstd/compress/zstd_compress_superblock.h | 3 +- lib/zstd/compress/zstd_cwksp.h | 222 +- lib/zstd/compress/zstd_double_fast.c | 245 +- lib/zstd/compress/zstd_double_fast.h | 27 +- lib/zstd/compress/zstd_fast.c | 703 +++- lib/zstd/compress/zstd_fast.h | 16 +- lib/zstd/compress/zstd_lazy.c | 840 +++-- lib/zstd/compress/zstd_lazy.h | 195 +- lib/zstd/compress/zstd_ldm.c | 102 +- lib/zstd/compress/zstd_ldm.h | 17 +- lib/zstd/compress/zstd_ldm_geartab.h | 3 +- lib/zstd/compress/zstd_opt.c | 571 +-- lib/zstd/compress/zstd_opt.h | 55 +- lib/zstd/compress/zstd_preSplit.c | 239 ++ lib/zstd/compress/zstd_preSplit.h | 34 + lib/zstd/decompress/huf_decompress.c | 887 +++-- lib/zstd/decompress/zstd_ddict.c | 9 +- lib/zstd/decompress/zstd_ddict.h | 3 +- lib/zstd/decompress/zstd_decompress.c | 375 +- lib/zstd/decompress/zstd_decompress_block.c | 724 ++-- lib/zstd/decompress/zstd_decompress_block.h | 10 +- .../decompress/zstd_decompress_internal.h | 19 +- lib/zstd/decompress_sources.h | 2 +- lib/zstd/zstd_common_module.c | 5 +- lib/zstd/zstd_compress_module.c | 75 +- lib/zstd/zstd_decompress_module.c | 4 +- 60 files changed, 8749 insertions(+), 4381 deletions(-) create mode 100644 lib/zstd/common/allocations.h create mode 100644 lib/zstd/common/bits.h create mode 100644 lib/zstd/compress/zstd_preSplit.c create mode 100644 lib/zstd/compress/zstd_preSplit.h diff --git a/include/linux/zstd.h b/include/linux/zstd.h index b2c7cf310c8f..2f2a3c8b8a33 100644 --- a/include/linux/zstd.h +++ b/include/linux/zstd.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */ /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -160,7 +160,6 @@ typedef ZSTD_parameters zstd_parameters; zstd_parameters zstd_get_params(int level, unsigned long long estimated_src_size); =20 - /** * zstd_get_cparams() - returns zstd_compression_parameters for selected l= evel * @level: The compression level @@ -173,9 +172,20 @@ zstd_parameters zstd_get_params(int level, zstd_compression_parameters zstd_get_cparams(int level, unsigned long long estimated_src_size, size_t dict_size); =20 -/* =3D=3D=3D=3D=3D=3D Single-pass Compression =3D=3D=3D=3D=3D=3D */ - typedef ZSTD_CCtx zstd_cctx; +typedef ZSTD_cParameter zstd_cparameter; + +/** + * zstd_cctx_set_param() - sets a compression parameter + * @cctx: The context. Must have been initialized with zstd_init_c= ctx(). + * @param: The parameter to set. + * @value: The value to set the parameter to. + * + * Return: Zero or an error, which can be checked using zstd_is_err= or(). + */ +size_t zstd_cctx_set_param(zstd_cctx *cctx, zstd_cparameter param, int val= ue); + +/* =3D=3D=3D=3D=3D=3D Single-pass Compression =3D=3D=3D=3D=3D=3D */ =20 /** * zstd_cctx_workspace_bound() - max memory needed to initialize a zstd_cc= tx @@ -190,6 +200,20 @@ typedef ZSTD_CCtx zstd_cctx; */ size_t zstd_cctx_workspace_bound(const zstd_compression_parameters *parame= ters); =20 +/** + * zstd_cctx_workspace_bound_with_ext_seq_prod() - max memory needed to + * initialize a zstd_cctx when using the block-level external sequence + * producer API. + * @parameters: The compression parameters to be used. + * + * If multiple compression parameters might be used, the caller must call + * this function for each set of parameters and use the maximum size. + * + * Return: A lower bound on the size of the workspace that is passed = to + * zstd_init_cctx(). + */ +size_t zstd_cctx_workspace_bound_with_ext_seq_prod(const zstd_compression_= parameters *parameters); + /** * zstd_init_cctx() - initialize a zstd compression context * @workspace: The workspace to emplace the context into. It must out= live @@ -424,6 +448,16 @@ typedef ZSTD_CStream zstd_cstream; */ size_t zstd_cstream_workspace_bound(const zstd_compression_parameters *cpa= rams); =20 +/** + * zstd_cstream_workspace_bound_with_ext_seq_prod() - memory needed to ini= tialize + * a zstd_cstream when using the block-level external sequence producer AP= I. + * @cparams: The compression parameters to be used for compression. + * + * Return: A lower bound on the size of the workspace that is passed to + * zstd_init_cstream(). + */ +size_t zstd_cstream_workspace_bound_with_ext_seq_prod(const zstd_compressi= on_parameters *cparams); + /** * zstd_init_cstream() - initialize a zstd streaming compression context * @parameters The zstd parameters to use for compression. @@ -583,6 +617,18 @@ size_t zstd_decompress_stream(zstd_dstream *dstream, z= std_out_buffer *output, */ size_t zstd_find_frame_compressed_size(const void *src, size_t src_size); =20 +/** + * zstd_register_sequence_producer() - exposes the zstd library function + * ZSTD_registerSequenceProducer(). This is used for the block-level exter= nal + * sequence producer API. See upstream zstd.h for detailed documentation. + */ +typedef ZSTD_sequenceProducer_F zstd_sequence_producer_f; +void zstd_register_sequence_producer( + zstd_cctx *cctx, + void* sequence_producer_state, + zstd_sequence_producer_f sequence_producer +); + /** * struct zstd_frame_params - zstd frame parameters stored in the frame he= ader * @frameContentSize: The frame content size, or ZSTD_CONTENTSIZE_UNKNOWN = if not @@ -596,7 +642,7 @@ size_t zstd_find_frame_compressed_size(const void *src,= size_t src_size); * * See zstd_lib.h. */ -typedef ZSTD_frameHeader zstd_frame_header; +typedef ZSTD_FrameHeader zstd_frame_header; =20 /** * zstd_get_frame_header() - extracts parameters from a zstd or skippable = frame @@ -611,4 +657,35 @@ typedef ZSTD_frameHeader zstd_frame_header; size_t zstd_get_frame_header(zstd_frame_header *params, const void *src, size_t src_size); =20 +/** + * struct zstd_sequence - a sequence of literals or a match + * + * @offset: The offset of the match + * @litLength: The literal length of the sequence + * @matchLength: The match length of the sequence + * @rep: Represents which repeat offset is used + */ +typedef ZSTD_Sequence zstd_sequence; + +/** + * zstd_compress_sequences_and_literals() - compress an array of zstd_sequ= ence and literals + * + * @cctx: The zstd compression context. + * @dst: The buffer to compress the data into. + * @dst_capacity: The size of the destination buffer. + * @in_seqs: The array of zstd_sequence to compress. + * @in_seqs_size: The number of sequences in in_seqs. + * @literals: The literals associated to the sequences to be compressed. + * @lit_size: The size of the literals in the literals buffer. + * @lit_capacity: The size of the literals buffer. + * @decompressed_size: The size of the input data + * + * Return: The compressed size or an error, which can be checked using + * zstd_is_error(). + */ +size_t zstd_compress_sequences_and_literals(zstd_cctx *cctx, void* dst, si= ze_t dst_capacity, + const zstd_sequence *in_seqs, size_t in_seqs_size, + const void* literals, size_t lit_size, size_t lit_capacity, + size_t decompressed_size); + #endif /* LINUX_ZSTD_H */ diff --git a/include/linux/zstd_errors.h b/include/linux/zstd_errors.h index 58b6dd45a969..c307fb011132 100644 --- a/include/linux/zstd_errors.h +++ b/include/linux/zstd_errors.h @@ -1,5 +1,6 @@ +/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */ /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -12,13 +13,18 @@ #define ZSTD_ERRORS_H_398273423 =20 =20 -/*=3D=3D=3D=3D=3D dependency =3D=3D=3D=3D=3D*/ -#include /* size_t */ +/* =3D=3D=3D=3D=3D ZSTDERRORLIB_API : control library symbols visibility= =3D=3D=3D=3D=3D */ +#define ZSTDERRORLIB_VISIBLE=20 =20 +#ifndef ZSTDERRORLIB_HIDDEN +# if (__GNUC__ >=3D 4) && !defined(__MINGW32__) +# define ZSTDERRORLIB_HIDDEN __attribute__ ((visibility ("hidden"))) +# else +# define ZSTDERRORLIB_HIDDEN +# endif +#endif =20 -/* =3D=3D=3D=3D=3D ZSTDERRORLIB_API : control library symbols visibility= =3D=3D=3D=3D=3D */ -#define ZSTDERRORLIB_VISIBILITY=20 -#define ZSTDERRORLIB_API ZSTDERRORLIB_VISIBILITY +#define ZSTDERRORLIB_API ZSTDERRORLIB_VISIBLE =20 /*-********************************************* * Error codes list @@ -43,14 +49,18 @@ typedef enum { ZSTD_error_frameParameter_windowTooLarge =3D 16, ZSTD_error_corruption_detected =3D 20, ZSTD_error_checksum_wrong =3D 22, + ZSTD_error_literals_headerWrong =3D 24, ZSTD_error_dictionary_corrupted =3D 30, ZSTD_error_dictionary_wrong =3D 32, ZSTD_error_dictionaryCreation_failed =3D 34, ZSTD_error_parameter_unsupported =3D 40, + ZSTD_error_parameter_combination_unsupported =3D 41, ZSTD_error_parameter_outOfBound =3D 42, ZSTD_error_tableLog_tooLarge =3D 44, ZSTD_error_maxSymbolValue_tooLarge =3D 46, ZSTD_error_maxSymbolValue_tooSmall =3D 48, + ZSTD_error_cannotProduce_uncompressedBlock =3D 49, + ZSTD_error_stabilityCondition_notRespected =3D 50, ZSTD_error_stage_wrong =3D 60, ZSTD_error_init_missing =3D 62, ZSTD_error_memory_allocation =3D 64, @@ -58,18 +68,18 @@ typedef enum { ZSTD_error_dstSize_tooSmall =3D 70, ZSTD_error_srcSize_wrong =3D 72, ZSTD_error_dstBuffer_null =3D 74, + ZSTD_error_noForwardProgress_destFull =3D 80, + ZSTD_error_noForwardProgress_inputEmpty =3D 82, /* following error codes are __NOT STABLE__, they can be removed or chan= ged in future versions */ ZSTD_error_frameIndex_tooLarge =3D 100, ZSTD_error_seekableIO =3D 102, ZSTD_error_dstBuffer_wrong =3D 104, ZSTD_error_srcBuffer_wrong =3D 105, + ZSTD_error_sequenceProducer_failed =3D 106, + ZSTD_error_externalSequences_invalid =3D 107, ZSTD_error_maxCode =3D 120 /* never EVER use this value directly, it ca= n change in future versions! Use ZSTD_isError() instead */ } ZSTD_ErrorCode; =20 -/*! ZSTD_getErrorCode() : - convert a `size_t` function result into a `ZSTD_ErrorCode` enum type, - which can be used to compare with enum list published above */ -ZSTDERRORLIB_API ZSTD_ErrorCode ZSTD_getErrorCode(size_t functionResult); ZSTDERRORLIB_API const char* ZSTD_getErrorString(ZSTD_ErrorCode code); /= *< Same as ZSTD_getErrorName, but using a `ZSTD_ErrorCode` enum argument */ =20 =20 diff --git a/include/linux/zstd_lib.h b/include/linux/zstd_lib.h index 79d55465d5c1..e295d4125dde 100644 --- a/include/linux/zstd_lib.h +++ b/include/linux/zstd_lib.h @@ -1,5 +1,6 @@ +/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */ /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -11,23 +12,47 @@ #ifndef ZSTD_H_235446 #define ZSTD_H_235446 =20 -/* =3D=3D=3D=3D=3D=3D Dependency =3D=3D=3D=3D=3D=3D*/ -#include /* INT_MAX */ + +/* =3D=3D=3D=3D=3D=3D Dependencies =3D=3D=3D=3D=3D=3D*/ #include /* size_t */ =20 +#include /* list of errors */ +#if !defined(ZSTD_H_ZSTD_STATIC_LINKING_ONLY) +#include /* INT_MAX */ +#endif /* ZSTD_STATIC_LINKING_ONLY */ + =20 /* =3D=3D=3D=3D=3D ZSTDLIB_API : control library symbols visibility = =3D=3D=3D=3D=3D */ -#ifndef ZSTDLIB_VISIBLE +#define ZSTDLIB_VISIBLE=20 + +#ifndef ZSTDLIB_HIDDEN # if (__GNUC__ >=3D 4) && !defined(__MINGW32__) -# define ZSTDLIB_VISIBLE __attribute__ ((visibility ("default"))) # define ZSTDLIB_HIDDEN __attribute__ ((visibility ("hidden"))) # else -# define ZSTDLIB_VISIBLE # define ZSTDLIB_HIDDEN # endif #endif + #define ZSTDLIB_API ZSTDLIB_VISIBLE =20 +/* Deprecation warnings : + * Should these warnings be a problem, it is generally possible to disable= them, + * typically with -Wno-deprecated-declarations for gcc or _CRT_SECURE_NO_W= ARNINGS in Visual. + * Otherwise, it's also possible to define ZSTD_DISABLE_DEPRECATE_WARNINGS. + */ +#ifdef ZSTD_DISABLE_DEPRECATE_WARNINGS +# define ZSTD_DEPRECATED(message) /* disable deprecation warnings */ +#else +# if (defined(GNUC) && (GNUC > 4 || (GNUC =3D=3D 4 && GNUC_MINOR >=3D 5))= ) || defined(__clang__) || defined(__IAR_SYSTEMS_ICC__) +# define ZSTD_DEPRECATED(message) __attribute__((deprecated(message))) +# elif (__GNUC__ >=3D 3) +# define ZSTD_DEPRECATED(message) __attribute__((deprecated)) +# else +# pragma message("WARNING: You need to implement ZSTD_DEPRECATED for th= is compiler") +# define ZSTD_DEPRECATED(message) +# endif +#endif /* ZSTD_DISABLE_DEPRECATE_WARNINGS */ + =20 /* ***********************************************************************= ****** Introduction @@ -65,7 +90,7 @@ /*------ Version ------*/ #define ZSTD_VERSION_MAJOR 1 #define ZSTD_VERSION_MINOR 5 -#define ZSTD_VERSION_RELEASE 2 +#define ZSTD_VERSION_RELEASE 7 #define ZSTD_VERSION_NUMBER (ZSTD_VERSION_MAJOR *100*100 + ZSTD_VERSION_M= INOR *100 + ZSTD_VERSION_RELEASE) =20 /*! ZSTD_versionNumber() : @@ -103,11 +128,12 @@ ZSTDLIB_API const char* ZSTD_versionString(void); =20 =20 /* ************************************* -* Simple API +* Simple Core API ***************************************/ /*! ZSTD_compress() : * Compresses `src` content as a single zstd compressed frame into alread= y allocated `dst`. - * Hint : compression runs faster if `dstCapacity` >=3D `ZSTD_compressBo= und(srcSize)`. + * NOTE: Providing `dstCapacity >=3D ZSTD_compressBound(srcSize)` guarant= ees that zstd will have + * enough space to successfully compress the data. * @return : compressed size written into `dst` (<=3D `dstCapacity), * or an error code if it fails (which can be tested using ZSTD= _isError()). */ ZSTDLIB_API size_t ZSTD_compress( void* dst, size_t dstCapacity, @@ -115,47 +141,55 @@ ZSTDLIB_API size_t ZSTD_compress( void* dst, size_t d= stCapacity, int compressionLevel); =20 /*! ZSTD_decompress() : - * `compressedSize` : must be the _exact_ size of some number of compress= ed and/or skippable frames. - * `dstCapacity` is an upper bound of originalSize to regenerate. - * If user cannot imply a maximum upper bound, it's better to use streami= ng mode to decompress data. - * @return : the number of bytes decompressed into `dst` (<=3D `dstCapaci= ty`), - * or an errorCode if it fails (which can be tested using ZSTD_= isError()). */ + * `compressedSize` : must be the _exact_ size of some number of compresse= d and/or skippable frames. + * Multiple compressed frames can be decompressed at once with this metho= d. + * The result will be the concatenation of all decompressed frames, back = to back. + * `dstCapacity` is an upper bound of originalSize to regenerate. + * First frame's decompressed size can be extracted using ZSTD_getFrameCo= ntentSize(). + * If maximum upper bound isn't known, prefer using streaming mode to dec= ompress data. + * @return : the number of bytes decompressed into `dst` (<=3D `dstCapacit= y`), + * or an errorCode if it fails (which can be tested using ZSTD_i= sError()). */ ZSTDLIB_API size_t ZSTD_decompress( void* dst, size_t dstCapacity, const void* src, size_t compressedSize); =20 + +/*=3D=3D=3D=3D=3D=3D Decompression helper functions =3D=3D=3D=3D=3D=3D*/ + /*! ZSTD_getFrameContentSize() : requires v1.3.0+ - * `src` should point to the start of a ZSTD encoded frame. - * `srcSize` must be at least as large as the frame header. - * hint : any size >=3D `ZSTD_frameHeaderSize_max` is large eno= ugh. - * @return : - decompressed size of `src` frame content, if known - * - ZSTD_CONTENTSIZE_UNKNOWN if the size cannot be determined - * - ZSTD_CONTENTSIZE_ERROR if an error occurred (e.g. invalid = magic number, srcSize too small) - * note 1 : a 0 return value means the frame is valid but "empty". - * note 2 : decompressed size is an optional field, it may not be presen= t, typically in streaming mode. - * When `return=3D=3DZSTD_CONTENTSIZE_UNKNOWN`, data to decompr= ess could be any size. - * In which case, it's necessary to use streaming mode to decom= press data. - * Optionally, application can rely on some implicit limit, - * as ZSTD_decompress() only needs an upper bound of decompress= ed size. - * (For example, data could be necessarily cut into blocks <=3D= 16 KB). - * note 3 : decompressed size is always present when compression is comp= leted using single-pass functions, - * such as ZSTD_compress(), ZSTD_compressCCtx() ZSTD_compress_u= singDict() or ZSTD_compress_usingCDict(). - * note 4 : decompressed size can be very large (64-bits value), - * potentially larger than what local system can handle as a si= ngle memory segment. - * In which case, it's necessary to use streaming mode to decom= press data. - * note 5 : If source is untrusted, decompressed size could be wrong or = intentionally modified. - * Always ensure return value fits within application's authori= zed limits. - * Each application can set its own limits. - * note 6 : This function replaces ZSTD_getDecompressedSize() */ + * `src` should point to the start of a ZSTD encoded frame. + * `srcSize` must be at least as large as the frame header. + * hint : any size >=3D `ZSTD_frameHeaderSize_max` is large enou= gh. + * @return : - decompressed size of `src` frame content, if known + * - ZSTD_CONTENTSIZE_UNKNOWN if the size cannot be determined + * - ZSTD_CONTENTSIZE_ERROR if an error occurred (e.g. invalid m= agic number, srcSize too small) + * note 1 : a 0 return value means the frame is valid but "empty". + * When invoking this method on a skippable frame, it will retur= n 0. + * note 2 : decompressed size is an optional field, it may not be present= (typically in streaming mode). + * When `return=3D=3DZSTD_CONTENTSIZE_UNKNOWN`, data to decompre= ss could be any size. + * In which case, it's necessary to use streaming mode to decomp= ress data. + * Optionally, application can rely on some implicit limit, + * as ZSTD_decompress() only needs an upper bound of decompresse= d size. + * (For example, data could be necessarily cut into blocks <=3D = 16 KB). + * note 3 : decompressed size is always present when compression is compl= eted using single-pass functions, + * such as ZSTD_compress(), ZSTD_compressCCtx() ZSTD_compress_us= ingDict() or ZSTD_compress_usingCDict(). + * note 4 : decompressed size can be very large (64-bits value), + * potentially larger than what local system can handle as a sin= gle memory segment. + * In which case, it's necessary to use streaming mode to decomp= ress data. + * note 5 : If source is untrusted, decompressed size could be wrong or i= ntentionally modified. + * Always ensure return value fits within application's authoriz= ed limits. + * Each application can set its own limits. + * note 6 : This function replaces ZSTD_getDecompressedSize() */ #define ZSTD_CONTENTSIZE_UNKNOWN (0ULL - 1) #define ZSTD_CONTENTSIZE_ERROR (0ULL - 2) ZSTDLIB_API unsigned long long ZSTD_getFrameContentSize(const void *src, s= ize_t srcSize); =20 -/*! ZSTD_getDecompressedSize() : - * NOTE: This function is now obsolete, in favor of ZSTD_getFrameContentS= ize(). +/*! ZSTD_getDecompressedSize() (obsolete): + * This function is now obsolete, in favor of ZSTD_getFrameContentSize(). * Both functions work the same way, but ZSTD_getDecompressedSize() blends * "empty", "unknown" and "error" results to the same return value (0), * while ZSTD_getFrameContentSize() gives them separate return values. * @return : decompressed size of `src` frame content _if known and not em= pty_, 0 otherwise. */ +ZSTD_DEPRECATED("Replaced by ZSTD_getFrameContentSize") ZSTDLIB_API unsigned long long ZSTD_getDecompressedSize(const void* src, s= ize_t srcSize); =20 /*! ZSTD_findFrameCompressedSize() : Requires v1.4.0+ @@ -163,18 +197,50 @@ ZSTDLIB_API unsigned long long ZSTD_getDecompressedSi= ze(const void* src, size_t * `srcSize` must be >=3D first frame size * @return : the compressed size of the first frame starting at `src`, * suitable to pass as `srcSize` to `ZSTD_decompress` or similar, - * or an error code if input is invalid */ + * or an error code if input is invalid + * Note 1: this method is called _find*() because it's not enough to read= the header, + * it may have to scan through the frame's content, to reach its = end. + * Note 2: this method also works with Skippable Frames. In which case, + * it returns the size of the complete skippable frame, + * which is always equal to its content size + 8 bytes for header= s. */ ZSTDLIB_API size_t ZSTD_findFrameCompressedSize(const void* src, size_t sr= cSize); =20 =20 -/*=3D=3D=3D=3D=3D=3D Helper functions =3D=3D=3D=3D=3D=3D*/ -#define ZSTD_COMPRESSBOUND(srcSize) ((srcSize) + ((srcSize)>>8) + (((src= Size) < (128<<10)) ? (((128<<10) - (srcSize)) >> 11) /* margin, from 64 to = 0 */ : 0)) /* this formula ensures that bound(A) + bound(B) <=3D bound(A+B= ) as long as A and B >=3D 128 KB */ -ZSTDLIB_API size_t ZSTD_compressBound(size_t srcSize); /*!< maximum c= ompressed size in worst case single-pass scenario */ -ZSTDLIB_API unsigned ZSTD_isError(size_t code); /*!< tells if = a `size_t` function result is an error code */ -ZSTDLIB_API const char* ZSTD_getErrorName(size_t code); /*!< provides = readable string from an error code */ -ZSTDLIB_API int ZSTD_minCLevel(void); /*!< minimum n= egative compression level allowed, requires v1.4.0+ */ -ZSTDLIB_API int ZSTD_maxCLevel(void); /*!< maximum c= ompression level available */ -ZSTDLIB_API int ZSTD_defaultCLevel(void); /*!< default c= ompression level, specified by ZSTD_CLEVEL_DEFAULT, requires v1.5.0+ */ +/*=3D=3D=3D=3D=3D=3D Compression helper functions =3D=3D=3D=3D=3D=3D*/ + +/*! ZSTD_compressBound() : + * maximum compressed size in worst case single-pass scenario. + * When invoking `ZSTD_compress()`, or any other one-pass compression func= tion, + * it's recommended to provide @dstCapacity >=3D ZSTD_compressBound(srcSiz= e) + * as it eliminates one potential failure scenario, + * aka not enough room in dst buffer to write the compressed frame. + * Note : ZSTD_compressBound() itself can fail, if @srcSize >=3D ZSTD_MAX_= INPUT_SIZE . + * In which case, ZSTD_compressBound() will return an error code + * which can be tested using ZSTD_isError(). + * + * ZSTD_COMPRESSBOUND() : + * same as ZSTD_compressBound(), but as a macro. + * It can be used to produce constants, which can be useful for static all= ocation, + * for example to size a static array on stack. + * Will produce constant value 0 if srcSize is too large. + */ +#define ZSTD_MAX_INPUT_SIZE ((sizeof(size_t)=3D=3D8) ? 0xFF00FF00FF00FF00U= LL : 0xFF00FF00U) +#define ZSTD_COMPRESSBOUND(srcSize) (((size_t)(srcSize) >=3D ZSTD_MAX_IN= PUT_SIZE) ? 0 : (srcSize) + ((srcSize)>>8) + (((srcSize) < (128<<10)) ? (((= 128<<10) - (srcSize)) >> 11) /* margin, from 64 to 0 */ : 0)) /* this form= ula ensures that bound(A) + bound(B) <=3D bound(A+B) as long as A and B >= =3D 128 KB */ +ZSTDLIB_API size_t ZSTD_compressBound(size_t srcSize); /*!< maximum compre= ssed size in worst case single-pass scenario */ + + +/*=3D=3D=3D=3D=3D=3D Error helper functions =3D=3D=3D=3D=3D=3D*/ +/* ZSTD_isError() : + * Most ZSTD_* functions returning a size_t value can be tested for error, + * using ZSTD_isError(). + * @return 1 if error, 0 otherwise + */ +ZSTDLIB_API unsigned ZSTD_isError(size_t result); /*!< tells if a= `size_t` function result is an error code */ +ZSTDLIB_API ZSTD_ErrorCode ZSTD_getErrorCode(size_t functionResult); /* co= nvert a result into an error code, which can be compared to error enum list= */ +ZSTDLIB_API const char* ZSTD_getErrorName(size_t result); /*!< provides r= eadable string from a function result */ +ZSTDLIB_API int ZSTD_minCLevel(void); /*!< minimum ne= gative compression level allowed, requires v1.4.0+ */ +ZSTDLIB_API int ZSTD_maxCLevel(void); /*!< maximum co= mpression level available */ +ZSTDLIB_API int ZSTD_defaultCLevel(void); /*!< default co= mpression level, specified by ZSTD_CLEVEL_DEFAULT, requires v1.5.0+ */ =20 =20 /* ************************************* @@ -182,25 +248,25 @@ ZSTDLIB_API int ZSTD_defaultCLevel(void); = /*!< default compres ***************************************/ /*=3D Compression context * When compressing many times, - * it is recommended to allocate a context just once, - * and re-use it for each successive compression operation. - * This will make workload friendlier for system's memory. + * it is recommended to allocate a compression context just once, + * and reuse it for each successive compression operation. + * This will make the workload easier for system's memory. * Note : re-using context is just a speed / resource optimization. * It doesn't change the compression ratio, which remains identica= l. - * Note 2 : In multi-threaded environments, - * use one different context per thread for parallel execution. + * Note 2: For parallel execution in multi-threaded environments, + * use one different context per thread . */ typedef struct ZSTD_CCtx_s ZSTD_CCtx; ZSTDLIB_API ZSTD_CCtx* ZSTD_createCCtx(void); -ZSTDLIB_API size_t ZSTD_freeCCtx(ZSTD_CCtx* cctx); /* accept NULL poi= nter */ +ZSTDLIB_API size_t ZSTD_freeCCtx(ZSTD_CCtx* cctx); /* compatible with= NULL pointer */ =20 /*! ZSTD_compressCCtx() : * Same as ZSTD_compress(), using an explicit ZSTD_CCtx. - * Important : in order to behave similarly to `ZSTD_compress()`, - * this function compresses at requested compression level, - * __ignoring any other parameter__ . + * Important : in order to mirror `ZSTD_compress()` behavior, + * this function compresses at the requested compression level, + * __ignoring any other advanced parameter__ . * If any advanced parameter was set using the advanced API, - * they will all be reset. Only `compressionLevel` remains. + * they will all be reset. Only @compressionLevel remains. */ ZSTDLIB_API size_t ZSTD_compressCCtx(ZSTD_CCtx* cctx, void* dst, size_t dstCapacity, @@ -210,7 +276,7 @@ ZSTDLIB_API size_t ZSTD_compressCCtx(ZSTD_CCtx* cctx, /*=3D Decompression context * When decompressing many times, * it is recommended to allocate a context only once, - * and re-use it for each successive compression operation. + * and reuse it for each successive compression operation. * This will make workload friendlier for system's memory. * Use one context per thread for parallel execution. */ typedef struct ZSTD_DCtx_s ZSTD_DCtx; @@ -220,7 +286,7 @@ ZSTDLIB_API size_t ZSTD_freeDCtx(ZSTD_DCtx* dctx); = /* accept NULL pointer * /*! ZSTD_decompressDCtx() : * Same as ZSTD_decompress(), * requires an allocated ZSTD_DCtx. - * Compatible with sticky parameters. + * Compatible with sticky parameters (see below). */ ZSTDLIB_API size_t ZSTD_decompressDCtx(ZSTD_DCtx* dctx, void* dst, size_t dstCapacity, @@ -236,12 +302,12 @@ ZSTDLIB_API size_t ZSTD_decompressDCtx(ZSTD_DCtx* dct= x, * using ZSTD_CCtx_set*() functions. * Pushed parameters are sticky : they are valid for next compressed fra= me, and any subsequent frame. * "sticky" parameters are applicable to `ZSTD_compress2()` and `ZSTD_co= mpressStream*()` ! - * __They do not apply to "simple" one-shot variants such as ZSTD_compre= ssCCtx()__ . + * __They do not apply to one-shot variants such as ZSTD_compressCCtx()_= _ . * * It's possible to reset all parameters to "default" using ZSTD_CCtx_re= set(). * * This API supersedes all other "advanced" API entry points in the expe= rimental section. - * In the future, we expect to remove from experimental API entry points= which are redundant with this API. + * In the future, we expect to remove API entry points from experimental= which are redundant with this API. */ =20 =20 @@ -324,6 +390,19 @@ typedef enum { * The higher the value of selected strategy,= the more complex it is, * resulting in stronger and slower compressi= on. * Special: value 0 means "use default strate= gy". */ + + ZSTD_c_targetCBlockSize=3D130, /* v1.5.6+ + * Attempts to fit compressed block size = into approximately targetCBlockSize. + * Bound by ZSTD_TARGETCBLOCKSIZE_MIN and= ZSTD_TARGETCBLOCKSIZE_MAX. + * Note that it's not a guarantee, just a= convergence target (default:0). + * No target when targetCBlockSize =3D=3D= 0. + * This is helpful in low bandwidth strea= ming environments to improve end-to-end latency, + * when a client can make use of partial = documents (a prominent example being Chrome). + * Note: this parameter is stable since v= 1.5.6. + * It was present as an experimental para= meter in earlier versions, + * but it's not recommended using it with= earlier library versions + * due to massive performance regressions. + */ /* LDM mode parameters */ ZSTD_c_enableLongDistanceMatching=3D160, /* Enable long distance match= ing. * This parameter is designed to impro= ve compression ratio @@ -403,15 +482,18 @@ typedef enum { * ZSTD_c_forceMaxWindow * ZSTD_c_forceAttachDict * ZSTD_c_literalCompressionMode - * ZSTD_c_targetCBlockSize * ZSTD_c_srcSizeHint * ZSTD_c_enableDedicatedDictSearch * ZSTD_c_stableInBuffer * ZSTD_c_stableOutBuffer * ZSTD_c_blockDelimiters * ZSTD_c_validateSequences - * ZSTD_c_useBlockSplitter + * ZSTD_c_blockSplitterLevel + * ZSTD_c_splitAfterSequences * ZSTD_c_useRowMatchFinder + * ZSTD_c_prefetchCDictTables + * ZSTD_c_enableSeqProducerFallback + * ZSTD_c_maxBlockSize * Because they are not stable, it's necessary to define ZSTD_STATIC_L= INKING_ONLY to access them. * note : never ever use experimentalParam? names directly; * also, the enums values themselves are unstable and can still= change. @@ -421,7 +503,7 @@ typedef enum { ZSTD_c_experimentalParam3=3D1000, ZSTD_c_experimentalParam4=3D1001, ZSTD_c_experimentalParam5=3D1002, - ZSTD_c_experimentalParam6=3D1003, + /* was ZSTD_c_experimentalParam6=3D1003; is now ZSTD_c_targetCBlockSi= ze */ ZSTD_c_experimentalParam7=3D1004, ZSTD_c_experimentalParam8=3D1005, ZSTD_c_experimentalParam9=3D1006, @@ -430,7 +512,12 @@ typedef enum { ZSTD_c_experimentalParam12=3D1009, ZSTD_c_experimentalParam13=3D1010, ZSTD_c_experimentalParam14=3D1011, - ZSTD_c_experimentalParam15=3D1012 + ZSTD_c_experimentalParam15=3D1012, + ZSTD_c_experimentalParam16=3D1013, + ZSTD_c_experimentalParam17=3D1014, + ZSTD_c_experimentalParam18=3D1015, + ZSTD_c_experimentalParam19=3D1016, + ZSTD_c_experimentalParam20=3D1017 } ZSTD_cParameter; =20 typedef struct { @@ -493,7 +580,7 @@ typedef enum { * They will be used to compress next frame. * Resetting session never fails. * - The parameters : changes all parameters back to "default". - * This removes any reference to any dictionary too. + * This also removes any reference to any dictionary or e= xternal sequence producer. * Parameters can only be changed between 2 sessions (i.e= . no compression is currently ongoing) * otherwise the reset fails, and function returns an err= or value (which can be tested using ZSTD_isError()) * - Both : similar to resetting the session, followed by resetting param= eters. @@ -502,11 +589,13 @@ ZSTDLIB_API size_t ZSTD_CCtx_reset(ZSTD_CCtx* cctx, Z= STD_ResetDirective reset); =20 /*! ZSTD_compress2() : * Behave the same as ZSTD_compressCCtx(), but compression parameters are= set using the advanced API. + * (note that this entry point doesn't even expose a compression level pa= rameter). * ZSTD_compress2() always starts a new frame. * Should cctx hold data from a previously unfinished frame, everything a= bout it is forgotten. * - Compression parameters are pushed into CCtx before starting compress= ion, using ZSTD_CCtx_set*() * - The function is always blocking, returns when compression is complet= ed. - * Hint : compression runs faster if `dstCapacity` >=3D `ZSTD_compressBo= und(srcSize)`. + * NOTE: Providing `dstCapacity >=3D ZSTD_compressBound(srcSize)` guarant= ees that zstd will have + * enough space to successfully compress the data, though it is pos= sible it fails for other reasons. * @return : compressed size written into `dst` (<=3D `dstCapacity), * or an error code if it fails (which can be tested using ZSTD_= isError()). */ @@ -543,13 +632,17 @@ typedef enum { * ZSTD_d_stableOutBuffer * ZSTD_d_forceIgnoreChecksum * ZSTD_d_refMultipleDDicts + * ZSTD_d_disableHuffmanAssembly + * ZSTD_d_maxBlockSize * Because they are not stable, it's necessary to define ZSTD_STATIC_L= INKING_ONLY to access them. * note : never ever use experimentalParam? names directly */ ZSTD_d_experimentalParam1=3D1000, ZSTD_d_experimentalParam2=3D1001, ZSTD_d_experimentalParam3=3D1002, - ZSTD_d_experimentalParam4=3D1003 + ZSTD_d_experimentalParam4=3D1003, + ZSTD_d_experimentalParam5=3D1004, + ZSTD_d_experimentalParam6=3D1005 =20 } ZSTD_dParameter; =20 @@ -604,14 +697,14 @@ typedef struct ZSTD_outBuffer_s { * A ZSTD_CStream object is required to track streaming operation. * Use ZSTD_createCStream() and ZSTD_freeCStream() to create/release resou= rces. * ZSTD_CStream objects can be reused multiple times on consecutive compre= ssion operations. -* It is recommended to re-use ZSTD_CStream since it will play nicer with = system's memory, by re-using already allocated memory. +* It is recommended to reuse ZSTD_CStream since it will play nicer with s= ystem's memory, by re-using already allocated memory. * * For parallel execution, use one separate ZSTD_CStream per thread. * * note : since v1.3.0, ZSTD_CStream and ZSTD_CCtx are the same thing. * * Parameters are sticky : when starting a new compression on the same con= text, -* it will re-use the same sticky parameters as previous compression sessi= on. +* it will reuse the same sticky parameters as previous compression sessio= n. * When in doubt, it's recommended to fully initialize the context before = usage. * Use ZSTD_CCtx_reset() to reset the context and ZSTD_CCtx_setParameter(), * ZSTD_CCtx_setPledgedSrcSize(), or ZSTD_CCtx_loadDictionary() and friend= s to @@ -700,6 +793,11 @@ typedef enum { * only ZSTD_e_end or ZSTD_e_flush operations are allowed. * Before starting a new compression job, or changing compressi= on parameters, * it is required to fully flush internal buffers. + * - note: if an operation ends with an error, it may leave @cctx in an u= ndefined state. + * Therefore, it's UB to invoke ZSTD_compressStream2() of ZSTD_co= mpressStream() on such a state. + * In order to be re-employed after an error, a state must be res= et, + * which can be done explicitly (ZSTD_CCtx_reset()), + * or is sometimes implied by methods starting a new compression = job (ZSTD_initCStream(), ZSTD_compressCCtx()) */ ZSTDLIB_API size_t ZSTD_compressStream2( ZSTD_CCtx* cctx, ZSTD_outBuffer* output, @@ -728,8 +826,6 @@ ZSTDLIB_API size_t ZSTD_CStreamOutSize(void); /*< rec= ommended size for output * This following is a legacy streaming API, available since v1.0+ . * It can be replaced by ZSTD_CCtx_reset() and ZSTD_compressStream2(). * It is redundant, but remains fully supported. - * Streaming in combination with advanced parameters and dictionary compre= ssion - * can only be used through the new API. *************************************************************************= *****/ =20 /*! @@ -738,6 +834,9 @@ ZSTDLIB_API size_t ZSTD_CStreamOutSize(void); /*< rec= ommended size for output * ZSTD_CCtx_reset(zcs, ZSTD_reset_session_only); * ZSTD_CCtx_refCDict(zcs, NULL); // clear the dictionary (if any) * ZSTD_CCtx_setParameter(zcs, ZSTD_c_compressionLevel, compressionLev= el); + * + * Note that ZSTD_initCStream() clears any previously set dictionary. Use = the new API + * to compress with a dictionary. */ ZSTDLIB_API size_t ZSTD_initCStream(ZSTD_CStream* zcs, int compressionLeve= l); /*! @@ -758,7 +857,7 @@ ZSTDLIB_API size_t ZSTD_endStream(ZSTD_CStream* zcs, ZS= TD_outBuffer* output); * * A ZSTD_DStream object is required to track streaming operations. * Use ZSTD_createDStream() and ZSTD_freeDStream() to create/release resou= rces. -* ZSTD_DStream objects can be re-used multiple times. +* ZSTD_DStream objects can be re-employed multiple times. * * Use ZSTD_initDStream() to start a new decompression operation. * @return : recommended first input size @@ -768,16 +867,21 @@ ZSTDLIB_API size_t ZSTD_endStream(ZSTD_CStream* zcs, = ZSTD_outBuffer* output); * The function will update both `pos` fields. * If `input.pos < input.size`, some input has not been consumed. * It's up to the caller to present again remaining data. +* * The function tries to flush all data decoded immediately, respecting ou= tput buffer size. * If `output.pos < output.size`, decoder has flushed everything it could. -* But if `output.pos =3D=3D output.size`, there might be some data left w= ithin internal buffers., +* +* However, when `output.pos =3D=3D output.size`, it's more difficult to k= now. +* If @return > 0, the frame is not complete, meaning +* either there is still some data left to flush within internal buffers, +* or there is more input to read to complete the frame (or both). * In which case, call ZSTD_decompressStream() again to flush whatever rem= ains in the buffer. * Note : with no additional input provided, amount of data flushed is nec= essarily <=3D ZSTD_BLOCKSIZE_MAX. * @return : 0 when a frame is completely decoded and fully flushed, * or an error code, which can be tested using ZSTD_isError(), * or any other value > 0, which means there is still some decoding = or flushing to do to complete current frame : * the return value is a suggested next inpu= t size (just a hint for better latency) -* that will never request more than the rem= aining frame size. +* that will never request more than the rem= aining content of the compressed frame. * ************************************************************************= *******/ =20 typedef ZSTD_DCtx ZSTD_DStream; /*< DCtx and DStream are now effectively = same object (>=3D v1.3.0) */ @@ -788,13 +892,38 @@ ZSTDLIB_API size_t ZSTD_freeDStream(ZSTD_DStream* zds= ); /* accept NULL pointer =20 /*=3D=3D=3D=3D=3D Streaming decompression functions =3D=3D=3D=3D=3D*/ =20 -/* This function is redundant with the advanced API and equivalent to: +/*! ZSTD_initDStream() : + * Initialize/reset DStream state for new decompression operation. + * Call before new decompression operation using same DStream. * + * Note : This function is redundant with the advanced API and equivalent = to: * ZSTD_DCtx_reset(zds, ZSTD_reset_session_only); * ZSTD_DCtx_refDDict(zds, NULL); */ ZSTDLIB_API size_t ZSTD_initDStream(ZSTD_DStream* zds); =20 +/*! ZSTD_decompressStream() : + * Streaming decompression function. + * Call repetitively to consume full input updating it as necessary. + * Function will update both input and output `pos` fields exposing curren= t state via these fields: + * - `input.pos < input.size`, some input remaining and caller should prov= ide remaining input + * on the next call. + * - `output.pos < output.size`, decoder flushed internal output buffer. + * - `output.pos =3D=3D output.size`, unflushed data potentially present i= n the internal buffers, + * check ZSTD_decompressStream() @return value, + * if > 0, invoke it again to flush remaining data to output. + * Note : with no additional input, amount of data flushed <=3D ZSTD_BLOCK= SIZE_MAX. + * + * @return : 0 when a frame is completely decoded and fully flushed, + * or an error code, which can be tested using ZSTD_isError(), + * or any other value > 0, which means there is some decoding or= flushing to do to complete current frame. + * + * Note: when an operation returns with an error code, the @zds state may = be left in undefined state. + * It's UB to invoke `ZSTD_decompressStream()` on such a state. + * In order to re-use such a state, it must be first reset, + * which can be done explicitly (`ZSTD_DCtx_reset()`), + * or is implied for operations starting some new decompression job = (`ZSTD_initDStream`, `ZSTD_decompressDCtx()`, `ZSTD_decompress_usingDict()`) + */ ZSTDLIB_API size_t ZSTD_decompressStream(ZSTD_DStream* zds, ZSTD_outBuffer= * output, ZSTD_inBuffer* input); =20 ZSTDLIB_API size_t ZSTD_DStreamInSize(void); /*!< recommended size for = input buffer */ @@ -913,7 +1042,7 @@ ZSTDLIB_API unsigned ZSTD_getDictID_fromDDict(const ZS= TD_DDict* ddict); * If @return =3D=3D 0, the dictID could not be decoded. * This could for one of the following reasons : * - The frame does not require a dictionary to be decoded (most common c= ase). - * - The frame was built with dictID intentionally removed. Whatever dict= ionary is necessary is a hidden information. + * - The frame was built with dictID intentionally removed. Whatever dict= ionary is necessary is a hidden piece of information. * Note : this use case also happens when using a non-conformant dictio= nary. * - `srcSize` is too small, and as a result, the frame header could not = be decoded (only possible if `srcSize < ZSTD_FRAMEHEADERSIZE_MAX`). * - This is not a Zstandard frame. @@ -925,9 +1054,11 @@ ZSTDLIB_API unsigned ZSTD_getDictID_fromFrame(const v= oid* src, size_t srcSize); * Advanced dictionary and prefix API (Requires v1.4.0+) * * This API allows dictionaries to be used with ZSTD_compress2(), - * ZSTD_compressStream2(), and ZSTD_decompressDCtx(). Dictionaries are sti= cky, and - * only reset with the context is reset with ZSTD_reset_parameters or - * ZSTD_reset_session_and_parameters. Prefixes are single-use. + * ZSTD_compressStream2(), and ZSTD_decompressDCtx(). + * Dictionaries are sticky, they remain valid when same context is reused, + * they only reset when the context is reset + * with ZSTD_reset_parameters or ZSTD_reset_session_and_parameters. + * In contrast, Prefixes are single-use. *************************************************************************= *****/ =20 =20 @@ -937,8 +1068,9 @@ ZSTDLIB_API unsigned ZSTD_getDictID_fromFrame(const vo= id* src, size_t srcSize); * @result : 0, or an error code (which can be tested with ZSTD_isError()). * Special: Loading a NULL (or 0-size) dictionary invalidates previous di= ctionary, * meaning "return to no-dictionary mode". - * Note 1 : Dictionary is sticky, it will be used for all future compress= ed frames. - * To return to "no-dictionary" situation, load a NULL dictionar= y (or reset parameters). + * Note 1 : Dictionary is sticky, it will be used for all future compress= ed frames, + * until parameters are reset, a new dictionary is loaded, or th= e dictionary + * is explicitly invalidated by loading a NULL dictionary. * Note 2 : Loading a dictionary involves building tables. * It's also a CPU consuming operation, with non-negligible impa= ct on latency. * Tables are dependent on compression parameters, and for this = reason, @@ -947,11 +1079,15 @@ ZSTDLIB_API unsigned ZSTD_getDictID_fromFrame(const = void* src, size_t srcSize); * Use experimental ZSTD_CCtx_loadDictionary_byReference() to re= ference content instead. * In such a case, dictionary buffer must outlive its users. * Note 4 : Use ZSTD_CCtx_loadDictionary_advanced() - * to precisely select how dictionary content must be interprete= d. */ + * to precisely select how dictionary content must be interprete= d. + * Note 5 : This method does not benefit from LDM (long distance mode). + * If you want to employ LDM on some large dictionary content, + * prefer employing ZSTD_CCtx_refPrefix() described below. + */ ZSTDLIB_API size_t ZSTD_CCtx_loadDictionary(ZSTD_CCtx* cctx, const void* d= ict, size_t dictSize); =20 /*! ZSTD_CCtx_refCDict() : Requires v1.4.0+ - * Reference a prepared dictionary, to be used for all next compressed fr= ames. + * Reference a prepared dictionary, to be used for all future compressed = frames. * Note that compression parameters are enforced from within CDict, * and supersede any compression parameter previously set within CCtx. * The parameters ignored are labelled as "superseded-by-cdict" in the ZS= TD_cParameter enum docs. @@ -970,6 +1106,7 @@ ZSTDLIB_API size_t ZSTD_CCtx_refCDict(ZSTD_CCtx* cctx,= const ZSTD_CDict* cdict); * Decompression will need same prefix to properly regenerate data. * Compressing with a prefix is similar in outcome as performing a diff a= nd compressing it, * but performs much faster, especially during decompression (compression= speed is tunable with compression level). + * This method is compatible with LDM (long distance mode). * @result : 0, or an error code (which can be tested with ZSTD_isError()). * Special: Adding any prefix (including NULL) invalidates any previous p= refix or dictionary * Note 1 : Prefix buffer is referenced. It **must** outlive compression. @@ -986,9 +1123,9 @@ ZSTDLIB_API size_t ZSTD_CCtx_refPrefix(ZSTD_CCtx* cctx, const void* prefix, size_t prefixSize); =20 /*! ZSTD_DCtx_loadDictionary() : Requires v1.4.0+ - * Create an internal DDict from dict buffer, - * to be used to decompress next frames. - * The dictionary remains valid for all future frames, until explicitly i= nvalidated. + * Create an internal DDict from dict buffer, to be used to decompress al= l future frames. + * The dictionary remains valid for all future frames, until explicitly i= nvalidated, or + * a new dictionary is loaded. * @result : 0, or an error code (which can be tested with ZSTD_isError()). * Special : Adding a NULL (or 0-size) dictionary invalidates any previou= s dictionary, * meaning "return to no-dictionary mode". @@ -1012,9 +1149,10 @@ ZSTDLIB_API size_t ZSTD_DCtx_loadDictionary(ZSTD_DCt= x* dctx, const void* dict, s * The memory for the table is allocated on the first call to refDDict, a= nd can be * freed with ZSTD_freeDCtx(). * + * If called with ZSTD_d_refMultipleDDicts disabled (the default), only o= ne dictionary + * will be managed, and referencing a dictionary effectively "discards" a= ny previous one. + * * @result : 0, or an error code (which can be tested with ZSTD_isError()). - * Note 1 : Currently, only one dictionary can be managed. - * Referencing a new dictionary effectively "discards" any previ= ous one. * Special: referencing a NULL DDict means "return to no-dictionary mode". * Note 2 : DDict is just referenced, its lifetime must outlive its usage= from DCtx. */ @@ -1051,6 +1189,7 @@ ZSTDLIB_API size_t ZSTD_sizeof_DStream(const ZSTD_DSt= ream* zds); ZSTDLIB_API size_t ZSTD_sizeof_CDict(const ZSTD_CDict* cdict); ZSTDLIB_API size_t ZSTD_sizeof_DDict(const ZSTD_DDict* ddict); =20 + #endif /* ZSTD_H_235446 */ =20 =20 @@ -1066,29 +1205,12 @@ ZSTDLIB_API size_t ZSTD_sizeof_DDict(const ZSTD_DDi= ct* ddict); #if !defined(ZSTD_H_ZSTD_STATIC_LINKING_ONLY) #define ZSTD_H_ZSTD_STATIC_LINKING_ONLY =20 + /* This can be overridden externally to hide static symbols. */ #ifndef ZSTDLIB_STATIC_API #define ZSTDLIB_STATIC_API ZSTDLIB_VISIBLE #endif =20 -/* Deprecation warnings : - * Should these warnings be a problem, it is generally possible to disable= them, - * typically with -Wno-deprecated-declarations for gcc or _CRT_SECURE_NO_W= ARNINGS in Visual. - * Otherwise, it's also possible to define ZSTD_DISABLE_DEPRECATE_WARNINGS. - */ -#ifdef ZSTD_DISABLE_DEPRECATE_WARNINGS -# define ZSTD_DEPRECATED(message) ZSTDLIB_STATIC_API /* disable deprecat= ion warnings */ -#else -# if (defined(GNUC) && (GNUC > 4 || (GNUC =3D=3D 4 && GNUC_MINOR >=3D 5))= ) || defined(__clang__) -# define ZSTD_DEPRECATED(message) ZSTDLIB_STATIC_API __attribute__((dep= recated(message))) -# elif (__GNUC__ >=3D 3) -# define ZSTD_DEPRECATED(message) ZSTDLIB_STATIC_API __attribute__((dep= recated)) -# else -# pragma message("WARNING: You need to implement ZSTD_DEPRECATED for th= is compiler") -# define ZSTD_DEPRECATED(message) ZSTDLIB_STATIC_API -# endif -#endif /* ZSTD_DISABLE_DEPRECATE_WARNINGS */ - /* ***********************************************************************= *************** * experimental API (static linking only) *************************************************************************= *************** @@ -1123,6 +1245,7 @@ ZSTDLIB_API size_t ZSTD_sizeof_DDict(const ZSTD_DDict= * ddict); #define ZSTD_TARGETLENGTH_MIN 0 /* note : comparing this constant to= an unsigned results in a tautological test */ #define ZSTD_STRATEGY_MIN ZSTD_fast #define ZSTD_STRATEGY_MAX ZSTD_btultra2 +#define ZSTD_BLOCKSIZE_MAX_MIN (1 << 10) /* The minimum valid max blocksiz= e. Maximum blocksizes smaller than this make compressBound() inaccurate. */ =20 =20 #define ZSTD_OVERLAPLOG_MIN 0 @@ -1146,7 +1269,7 @@ ZSTDLIB_API size_t ZSTD_sizeof_DDict(const ZSTD_DDict= * ddict); #define ZSTD_LDM_HASHRATELOG_MAX (ZSTD_WINDOWLOG_MAX - ZSTD_HASHLOG_MIN) =20 /* Advanced parameter bounds */ -#define ZSTD_TARGETCBLOCKSIZE_MIN 64 +#define ZSTD_TARGETCBLOCKSIZE_MIN 1340 /* suitable to fit into an ethern= et / wifi / 4G transport frame */ #define ZSTD_TARGETCBLOCKSIZE_MAX ZSTD_BLOCKSIZE_MAX #define ZSTD_SRCSIZEHINT_MIN 0 #define ZSTD_SRCSIZEHINT_MAX INT_MAX @@ -1188,7 +1311,7 @@ typedef struct { * * Note: This field is optional. ZSTD_genera= teSequences() will calculate the value of * 'rep', but repeat offsets do not necessar= ily need to be calculated from an external - * sequence provider's perspective. For exam= ple, ZSTD_compressSequences() does not + * sequence provider perspective. For exampl= e, ZSTD_compressSequences() does not * use this 'rep' field at all (as of now). */ } ZSTD_Sequence; @@ -1293,17 +1416,18 @@ typedef enum { } ZSTD_literalCompressionMode_e; =20 typedef enum { - /* Note: This enum controls features which are conditionally beneficial.= Zstd typically will make a final - * decision on whether or not to enable the feature (ZSTD_ps_auto), but = setting the switch to ZSTD_ps_enable - * or ZSTD_ps_disable allow for a force enable/disable the feature. + /* Note: This enum controls features which are conditionally beneficial. + * Zstd can take a decision on whether or not to enable the feature (ZST= D_ps_auto), + * but setting the switch to ZSTD_ps_enable or ZSTD_ps_disable force ena= ble/disable the feature. */ ZSTD_ps_auto =3D 0, /* Let the library automatically determine w= hether the feature shall be enabled */ ZSTD_ps_enable =3D 1, /* Force-enable the feature */ ZSTD_ps_disable =3D 2 /* Do not use the feature */ -} ZSTD_paramSwitch_e; +} ZSTD_ParamSwitch_e; +#define ZSTD_paramSwitch_e ZSTD_ParamSwitch_e /* old name */ =20 /* ************************************* -* Frame size functions +* Frame header and size functions ***************************************/ =20 /*! ZSTD_findDecompressedSize() : @@ -1345,34 +1469,130 @@ ZSTDLIB_STATIC_API unsigned long long ZSTD_findDec= ompressedSize(const void* src, ZSTDLIB_STATIC_API unsigned long long ZSTD_decompressBound(const void* src= , size_t srcSize); =20 /*! ZSTD_frameHeaderSize() : - * srcSize must be >=3D ZSTD_FRAMEHEADERSIZE_PREFIX. + * srcSize must be large enough, aka >=3D ZSTD_FRAMEHEADERSIZE_PREFIX. * @return : size of the Frame Header, * or an error code (if srcSize is too small) */ ZSTDLIB_STATIC_API size_t ZSTD_frameHeaderSize(const void* src, size_t src= Size); =20 +typedef enum { ZSTD_frame, ZSTD_skippableFrame } ZSTD_FrameType_e; +#define ZSTD_frameType_e ZSTD_FrameType_e /* old name */ +typedef struct { + unsigned long long frameContentSize; /* if =3D=3D ZSTD_CONTENTSIZE_UNK= NOWN, it means this field is not available. 0 means "empty" */ + unsigned long long windowSize; /* can be very large, up to <=3D = frameContentSize */ + unsigned blockSizeMax; + ZSTD_FrameType_e frameType; /* if =3D=3D ZSTD_skippableFrame,= frameContentSize is the size of skippable content */ + unsigned headerSize; + unsigned dictID; /* for ZSTD_skippableFrame, conta= ins the skippable magic variant [0-15] */ + unsigned checksumFlag; + unsigned _reserved1; + unsigned _reserved2; +} ZSTD_FrameHeader; +#define ZSTD_frameHeader ZSTD_FrameHeader /* old name */ + +/*! ZSTD_getFrameHeader() : + * decode Frame Header into `zfhPtr`, or requires larger `srcSize`. + * @return : 0 =3D> header is complete, `zfhPtr` is correctly filled, + * >0 =3D> `srcSize` is too small, @return value is the wanted `s= rcSize` amount, `zfhPtr` is not filled, + * or an error code, which can be tested using ZSTD_isError() */ +ZSTDLIB_STATIC_API size_t ZSTD_getFrameHeader(ZSTD_FrameHeader* zfhPtr, co= nst void* src, size_t srcSize); +/*! ZSTD_getFrameHeader_advanced() : + * same as ZSTD_getFrameHeader(), + * with added capability to select a format (like ZSTD_f_zstd1_magicless)= */ +ZSTDLIB_STATIC_API size_t ZSTD_getFrameHeader_advanced(ZSTD_FrameHeader* z= fhPtr, const void* src, size_t srcSize, ZSTD_format_e format); + +/*! ZSTD_decompressionMargin() : + * Zstd supports in-place decompression, where the input and output buffer= s overlap. + * In this case, the output buffer must be at least (Margin + Output_Size)= bytes large, + * and the input buffer must be at the end of the output buffer. + * + * _______________________ Output Buffer ________________________ + * | | + * | ____ Input Buffer ____| + * | | | + * v v v + * |---------------------------------------|-----------|----------| + * ^ ^ ^ + * |___________________ Output_Size ___________________|_ Margin _| + * + * NOTE: See also ZSTD_DECOMPRESSION_MARGIN(). + * NOTE: This applies only to single-pass decompression through ZSTD_decom= press() or + * ZSTD_decompressDCtx(). + * NOTE: This function supports multi-frame input. + * + * @param src The compressed frame(s) + * @param srcSize The size of the compressed frame(s) + * @returns The decompression margin or an error that can be checked with = ZSTD_isError(). + */ +ZSTDLIB_STATIC_API size_t ZSTD_decompressionMargin(const void* src, size_t= srcSize); + +/*! ZSTD_DECOMPRESS_MARGIN() : + * Similar to ZSTD_decompressionMargin(), but instead of computing the mar= gin from + * the compressed frame, compute it from the original size and the blockSi= zeLog. + * See ZSTD_decompressionMargin() for details. + * + * WARNING: This macro does not support multi-frame input, the input must = be a single + * zstd frame. If you need that support use the function, or implement it = yourself. + * + * @param originalSize The original uncompressed size of the data. + * @param blockSize The block size =3D=3D MIN(windowSize, ZSTD_BLOCKSIZ= E_MAX). + * Unless you explicitly set the windowLog smaller than + * ZSTD_BLOCKSIZELOG_MAX you can just use ZSTD_BLOCKSI= ZE_MAX. + */ +#define ZSTD_DECOMPRESSION_MARGIN(originalSize, blockSize) ((size_t)( = \ + ZSTD_FRAMEHEADERSIZE_MAX = /* Frame header */ + \ + 4 = /* checksum */ + \ + ((originalSize) =3D=3D 0 ? 0 : 3 * (((originalSize) + (blockSize) = - 1) / blockSize)) /* 3 bytes per block */ + \ + (blockSize) = /* One block of margin */ \ + )) + typedef enum { - ZSTD_sf_noBlockDelimiters =3D 0, /* Representation of ZSTD_Seque= nce has no block delimiters, sequences only */ - ZSTD_sf_explicitBlockDelimiters =3D 1 /* Representation of ZSTD_Seque= nce contains explicit block delimiters */ -} ZSTD_sequenceFormat_e; + ZSTD_sf_noBlockDelimiters =3D 0, /* ZSTD_Sequence[] has no block= delimiters, just sequences */ + ZSTD_sf_explicitBlockDelimiters =3D 1 /* ZSTD_Sequence[] contains exp= licit block delimiters */ +} ZSTD_SequenceFormat_e; +#define ZSTD_sequenceFormat_e ZSTD_SequenceFormat_e /* old name */ + +/*! ZSTD_sequenceBound() : + * `srcSize` : size of the input buffer + * @return : upper-bound for the number of sequences that can be generated + * from a buffer of srcSize bytes + * + * note : returns number of sequences - to get bytes, multiply by sizeof(= ZSTD_Sequence). + */ +ZSTDLIB_STATIC_API size_t ZSTD_sequenceBound(size_t srcSize); =20 /*! ZSTD_generateSequences() : - * Generate sequences using ZSTD_compress2, given a source buffer. + * WARNING: This function is meant for debugging and informational purpose= s ONLY! + * Its implementation is flawed, and it will be deleted in a future versio= n. + * It is not guaranteed to succeed, as there are several cases where it wi= ll give + * up and fail. You should NOT use this function in production code. + * + * This function is deprecated, and will be removed in a future version. + * + * Generate sequences using ZSTD_compress2(), given a source buffer. + * + * @param zc The compression context to be used for ZSTD_compress2(). Set = any + * compression parameters you need on this context. + * @param outSeqs The output sequences buffer of size @p outSeqsSize + * @param outSeqsCapacity The size of the output sequences buffer. + * ZSTD_sequenceBound(srcSize) is an upper bound on the= number + * of sequences that can be generated. + * @param src The source buffer to generate sequences from of size @p srcS= ize. + * @param srcSize The size of the source buffer. * * Each block will end with a dummy sequence * with offset =3D=3D 0, matchLength =3D=3D 0, and litLength =3D=3D length= of last literals. * litLength may be =3D=3D 0, and if so, then the sequence of (of: 0 ml: 0= ll: 0) * simply acts as a block delimiter. * - * zc can be used to insert custom compression params. - * This function invokes ZSTD_compress2 - * - * The output of this function can be fed into ZSTD_compressSequences() wi= th CCtx - * setting of ZSTD_c_blockDelimiters as ZSTD_sf_explicitBlockDelimiters - * @return : number of sequences generated + * @returns The number of sequences generated, necessarily less than + * ZSTD_sequenceBound(srcSize), or an error code that can be chec= ked + * with ZSTD_isError(). */ - -ZSTDLIB_STATIC_API size_t ZSTD_generateSequences(ZSTD_CCtx* zc, ZSTD_Seque= nce* outSeqs, - size_t outSeqsSize, const void* = src, size_t srcSize); +ZSTD_DEPRECATED("For debugging only, will be replaced by ZSTD_extractSeque= nces()") +ZSTDLIB_STATIC_API size_t +ZSTD_generateSequences(ZSTD_CCtx* zc, + ZSTD_Sequence* outSeqs, size_t outSeqsCapacity, + const void* src, size_t srcSize); =20 /*! ZSTD_mergeBlockDelimiters() : * Given an array of ZSTD_Sequence, remove all sequences that represent bl= ock delimiters/last literals @@ -1388,8 +1608,10 @@ ZSTDLIB_STATIC_API size_t ZSTD_generateSequences(ZST= D_CCtx* zc, ZSTD_Sequence* o ZSTDLIB_STATIC_API size_t ZSTD_mergeBlockDelimiters(ZSTD_Sequence* sequenc= es, size_t seqsSize); =20 /*! ZSTD_compressSequences() : - * Compress an array of ZSTD_Sequence, generated from the original source = buffer, into dst. - * If a dictionary is included, then the cctx should reference the dict. (= see: ZSTD_CCtx_refCDict(), ZSTD_CCtx_loadDictionary(), etc.) + * Compress an array of ZSTD_Sequence, associated with @src buffer, into d= st. + * @src contains the entire input (not just the literals). + * If @srcSize > sum(sequence.length), the remaining bytes are considered = all literals + * If a dictionary is included, then the cctx should reference the dict (s= ee: ZSTD_CCtx_refCDict(), ZSTD_CCtx_loadDictionary(), etc.). * The entire source is compressed into a single frame. * * The compression behavior changes based on cctx params. In particular: @@ -1398,11 +1620,17 @@ ZSTDLIB_STATIC_API size_t ZSTD_mergeBlockDelimiters= (ZSTD_Sequence* sequences, si * the block size derived from the cctx, and sequences may be split. Th= is is the default setting. * * If ZSTD_c_blockDelimiters =3D=3D ZSTD_sf_explicitBlockDelimiters, th= e array of ZSTD_Sequence is expected to contain - * block delimiters (defined in ZSTD_Sequence). Behavior is undefined i= f no block delimiters are provided. + * valid block delimiters (defined in ZSTD_Sequence). Behavior is undef= ined if no block delimiters are provided. + * + * When ZSTD_c_blockDelimiters =3D=3D ZSTD_sf_explicitBlockDelimiters, = it's possible to decide generating repcodes + * using the advanced parameter ZSTD_c_repcodeResolution. Repcodes will= improve compression ratio, though the benefit + * can vary greatly depending on Sequences. On the other hand, repcode = resolution is an expensive operation. + * By default, it's disabled at low (<10) compression levels, and enabl= ed above the threshold (>=3D10). + * ZSTD_c_repcodeResolution makes it possible to directly manage this p= rocessing in either direction. * - * If ZSTD_c_validateSequences =3D=3D 0, this function will blindly acc= ept the sequences provided. Invalid sequences cause undefined - * behavior. If ZSTD_c_validateSequences =3D=3D 1, then if sequence is = invalid (see doc/zstd_compression_format.md for - * specifics regarding offset/matchlength requirements) then the functi= on will bail out and return an error. + * If ZSTD_c_validateSequences =3D=3D 0, this function blindly accepts = the Sequences provided. Invalid Sequences cause undefined + * behavior. If ZSTD_c_validateSequences =3D=3D 1, then the function wi= ll detect invalid Sequences (see doc/zstd_compression_format.md for + * specifics regarding offset/matchlength requirements) and then bail o= ut and return an error. * * In addition to the two adjustable experimental params, there are oth= er important cctx params. * - ZSTD_c_minMatch MUST be set as less than or equal to the smallest = match generated by the match finder. It has a minimum value of ZSTD_MINMATC= H_MIN. @@ -1410,14 +1638,42 @@ ZSTDLIB_STATIC_API size_t ZSTD_mergeBlockDelimiters= (ZSTD_Sequence* sequences, si * - ZSTD_c_windowLog affects offset validation: this function will ret= urn an error at higher debug levels if a provided offset * is larger than what the spec allows for a given window log and dic= tionary (if present). See: doc/zstd_compression_format.md * - * Note: Repcodes are, as of now, always re-calculated within this functio= n, so ZSTD_Sequence::rep is unused. - * Note 2: Once we integrate ability to ingest repcodes, the explicit bloc= k delims mode must respect those repcodes exactly, - * and cannot emit an RLE block that disagrees with the repcode hi= story - * @return : final compressed size or a ZSTD error. - */ -ZSTDLIB_STATIC_API size_t ZSTD_compressSequences(ZSTD_CCtx* const cctx, vo= id* dst, size_t dstSize, - const ZSTD_Sequence* inSeqs, size_t inSe= qsSize, - const void* src, size_t srcSize); + * Note: Repcodes are, as of now, always re-calculated within this functio= n, ZSTD_Sequence.rep is effectively unused. + * Dev Note: Once ability to ingest repcodes become available, the explici= t block delims mode must respect those repcodes exactly, + * and cannot emit an RLE block that disagrees with the repcode hi= story. + * @return : final compressed size, or a ZSTD error code. + */ +ZSTDLIB_STATIC_API size_t +ZSTD_compressSequences(ZSTD_CCtx* cctx, + void* dst, size_t dstCapacity, + const ZSTD_Sequence* inSeqs, size_t inSeqsSize, + const void* src, size_t srcSize); + + +/*! ZSTD_compressSequencesAndLiterals() : + * This is a variant of ZSTD_compressSequences() which, + * instead of receiving (src,srcSize) as input parameter, receives (litera= ls,litSize), + * aka all the literals, already extracted and laid out into a single cont= inuous buffer. + * This can be useful if the process generating the sequences also happens= to generate the buffer of literals, + * thus skipping an extraction + caching stage. + * It's a speed optimization, useful when the right conditions are met, + * but it also features the following limitations: + * - Only supports explicit delimiter mode + * - Currently does not support Sequences validation (so input Sequences a= re trusted) + * - Not compatible with frame checksum, which must be disabled + * - If any block is incompressible, will fail and return an error + * - @litSize must be =3D=3D sum of all @.litLength fields in @inSeqs. Any= discrepancy will generate an error. + * - @litBufCapacity is the size of the underlying buffer into which liter= als are written, starting at address @literals. + * @litBufCapacity must be at least 8 bytes larger than @litSize. + * - @decompressedSize must be correct, and correspond to the sum of all S= equences. Any discrepancy will generate an error. + * @return : final compressed size, or a ZSTD error code. + */ +ZSTDLIB_STATIC_API size_t +ZSTD_compressSequencesAndLiterals(ZSTD_CCtx* cctx, + void* dst, size_t dstCapacity, + const ZSTD_Sequence* inSeqs, size_t nbSequence= s, + const void* literals, size_t litSize, size_t l= itBufCapacity, + size_t decompressedSize); =20 =20 /*! ZSTD_writeSkippableFrame() : @@ -1425,8 +1681,8 @@ ZSTDLIB_STATIC_API size_t ZSTD_compressSequences(ZSTD= _CCtx* const cctx, void* ds * * Skippable frames begin with a 4-byte magic number. There are 16 possibl= e choices of magic number, * ranging from ZSTD_MAGIC_SKIPPABLE_START to ZSTD_MAGIC_SKIPPABLE_START+1= 5. - * As such, the parameter magicVariant controls the exact skippable frame = magic number variant used, so - * the magic number used will be ZSTD_MAGIC_SKIPPABLE_START + magicVariant. + * As such, the parameter magicVariant controls the exact skippable frame = magic number variant used, + * so the magic number used will be ZSTD_MAGIC_SKIPPABLE_START + magicVari= ant. * * Returns an error if destination buffer is not large enough, if the sour= ce size is not representable * with a 4-byte unsigned int, or if the parameter magicVariant is greater= than 15 (and therefore invalid). @@ -1434,26 +1690,28 @@ ZSTDLIB_STATIC_API size_t ZSTD_compressSequences(ZS= TD_CCtx* const cctx, void* ds * @return : number of bytes written or a ZSTD error. */ ZSTDLIB_STATIC_API size_t ZSTD_writeSkippableFrame(void* dst, size_t dstCa= pacity, - const void* src, size_t srcSiz= e, unsigned magicVariant); + const void* src, size_t srcSi= ze, + unsigned magicVariant); =20 /*! ZSTD_readSkippableFrame() : - * Retrieves a zstd skippable frame containing data given by src, and writ= es it to dst buffer. + * Retrieves the content of a zstd skippable frame starting at @src, and w= rites it to @dst buffer. * - * The parameter magicVariant will receive the magicVariant that was suppl= ied when the frame was written, - * i.e. magicNumber - ZSTD_MAGIC_SKIPPABLE_START. This can be NULL if the= caller is not interested - * in the magicVariant. + * The parameter @magicVariant will receive the magicVariant that was supp= lied when the frame was written, + * i.e. magicNumber - ZSTD_MAGIC_SKIPPABLE_START. + * This can be NULL if the caller is not interested in the magicVariant. * * Returns an error if destination buffer is not large enough, or if the f= rame is not skippable. * * @return : number of bytes written or a ZSTD error. */ -ZSTDLIB_API size_t ZSTD_readSkippableFrame(void* dst, size_t dstCapacity, = unsigned* magicVariant, - const void* src, size_t srcSiz= e); +ZSTDLIB_STATIC_API size_t ZSTD_readSkippableFrame(void* dst, size_t dstCap= acity, + unsigned* magicVariant, + const void* src, size_t = srcSize); =20 /*! ZSTD_isSkippableFrame() : * Tells if the content of `buffer` starts with a valid Frame Identifier = for a skippable frame. */ -ZSTDLIB_API unsigned ZSTD_isSkippableFrame(const void* buffer, size_t size= ); +ZSTDLIB_STATIC_API unsigned ZSTD_isSkippableFrame(const void* buffer, size= _t size); =20 =20 =20 @@ -1464,48 +1722,59 @@ ZSTDLIB_API unsigned ZSTD_isSkippableFrame(const vo= id* buffer, size_t size); /*! ZSTD_estimate*() : * These functions make it possible to estimate memory usage * of a future {D,C}Ctx, before its creation. + * This is useful in combination with ZSTD_initStatic(), + * which makes it possible to employ a static buffer for ZSTD_CCtx* state. * * ZSTD_estimateCCtxSize() will provide a memory budget large enough - * for any compression level up to selected one. - * Note : Unlike ZSTD_estimateCStreamSize*(), this estimate - * does not include space for a window buffer. - * Therefore, the estimation is only guaranteed for single-shot co= mpressions, not streaming. + * to compress data of any size using one-shot compression ZSTD_compressC= Ctx() or ZSTD_compress2() + * associated with any compression level up to max specified one. * The estimate will assume the input may be arbitrarily large, * which is the worst case. * + * Note that the size estimation is specific for one-shot compression, + * it is not valid for streaming (see ZSTD_estimateCStreamSize*()) + * nor other potential ways of using a ZSTD_CCtx* state. + * * When srcSize can be bound by a known and rather "small" value, - * this fact can be used to provide a tighter estimation - * because the CCtx compression context will need less memory. - * This tighter estimation can be provided by more advanced functions + * this knowledge can be used to provide a tighter budget estimation + * because the ZSTD_CCtx* state will need less memory for small inputs. + * This tighter estimation can be provided by employing more advanced fun= ctions * ZSTD_estimateCCtxSize_usingCParams(), which can be used in tandem with= ZSTD_getCParams(), * and ZSTD_estimateCCtxSize_usingCCtxParams(), which can be used in tand= em with ZSTD_CCtxParams_setParameter(). * Both can be used to estimate memory using custom compression parameter= s and arbitrary srcSize limits. * - * Note 2 : only single-threaded compression is supported. + * Note : only single-threaded compression is supported. * ZSTD_estimateCCtxSize_usingCCtxParams() will return an error code if Z= STD_c_nbWorkers is >=3D 1. */ -ZSTDLIB_STATIC_API size_t ZSTD_estimateCCtxSize(int compressionLevel); +ZSTDLIB_STATIC_API size_t ZSTD_estimateCCtxSize(int maxCompressionLevel); ZSTDLIB_STATIC_API size_t ZSTD_estimateCCtxSize_usingCParams(ZSTD_compress= ionParameters cParams); ZSTDLIB_STATIC_API size_t ZSTD_estimateCCtxSize_usingCCtxParams(const ZSTD= _CCtx_params* params); ZSTDLIB_STATIC_API size_t ZSTD_estimateDCtxSize(void); =20 /*! ZSTD_estimateCStreamSize() : - * ZSTD_estimateCStreamSize() will provide a budget large enough for any = compression level up to selected one. - * It will also consider src size to be arbitrarily "large", which is wor= st case. + * ZSTD_estimateCStreamSize() will provide a memory budget large enough f= or streaming compression + * using any compression level up to the max specified one. + * It will also consider src size to be arbitrarily "large", which is a w= orst case scenario. * If srcSize is known to always be small, ZSTD_estimateCStreamSize_using= CParams() can provide a tighter estimation. * ZSTD_estimateCStreamSize_usingCParams() can be used in tandem with ZST= D_getCParams() to create cParams from compressionLevel. * ZSTD_estimateCStreamSize_usingCCtxParams() can be used in tandem with = ZSTD_CCtxParams_setParameter(). Only single-threaded compression is support= ed. This function will return an error code if ZSTD_c_nbWorkers is >=3D 1. * Note : CStream size estimation is only correct for single-threaded com= pression. - * ZSTD_DStream memory budget depends on window Size. + * ZSTD_estimateCStreamSize_usingCCtxParams() will return an error code i= f ZSTD_c_nbWorkers is >=3D 1. + * Note 2 : ZSTD_estimateCStreamSize* functions are not compatible with t= he Block-Level Sequence Producer API at this time. + * Size estimates assume that no external sequence producer is registered. + * + * ZSTD_DStream memory budget depends on frame's window Size. * This information can be passed manually, using ZSTD_estimateDStreamSiz= e, * or deducted from a valid frame Header, using ZSTD_estimateDStreamSize_= fromFrame(); + * Any frame requesting a window size larger than max specified one will = be rejected. * Note : if streaming is init with function ZSTD_init?Stream_usingDict(), * an internal ?Dict will be created, which additional size is not= estimated here. - * In this case, get total size by adding ZSTD_estimate?DictSize */ -ZSTDLIB_STATIC_API size_t ZSTD_estimateCStreamSize(int compressionLevel); + * In this case, get total size by adding ZSTD_estimate?DictSize + */ +ZSTDLIB_STATIC_API size_t ZSTD_estimateCStreamSize(int maxCompressionLevel= ); ZSTDLIB_STATIC_API size_t ZSTD_estimateCStreamSize_usingCParams(ZSTD_compr= essionParameters cParams); ZSTDLIB_STATIC_API size_t ZSTD_estimateCStreamSize_usingCCtxParams(const Z= STD_CCtx_params* params); -ZSTDLIB_STATIC_API size_t ZSTD_estimateDStreamSize(size_t windowSize); +ZSTDLIB_STATIC_API size_t ZSTD_estimateDStreamSize(size_t maxWindowSize); ZSTDLIB_STATIC_API size_t ZSTD_estimateDStreamSize_fromFrame(const void* s= rc, size_t srcSize); =20 /*! ZSTD_estimate?DictSize() : @@ -1568,7 +1837,15 @@ typedef void (*ZSTD_freeFunction) (void* opaque, vo= id* address); typedef struct { ZSTD_allocFunction customAlloc; ZSTD_freeFunction customF= ree; void* opaque; } ZSTD_customMem; static __attribute__((__unused__)) + +#if defined(__clang__) && __clang_major__ >=3D 5 +#pragma clang diagnostic push +#pragma clang diagnostic ignored "-Wzero-as-null-pointer-constant" +#endif ZSTD_customMem const ZSTD_defaultCMem =3D { NULL, NULL, NULL }; /*< this = constant defers to stdlib's functions */ +#if defined(__clang__) && __clang_major__ >=3D 5 +#pragma clang diagnostic pop +#endif =20 ZSTDLIB_STATIC_API ZSTD_CCtx* ZSTD_createCCtx_advanced(ZSTD_customMem c= ustomMem); ZSTDLIB_STATIC_API ZSTD_CStream* ZSTD_createCStream_advanced(ZSTD_customMe= m customMem); @@ -1649,22 +1926,45 @@ ZSTDLIB_STATIC_API size_t ZSTD_checkCParams(ZSTD_co= mpressionParameters params); * This function never fails (wide contract) */ ZSTDLIB_STATIC_API ZSTD_compressionParameters ZSTD_adjustCParams(ZSTD_comp= ressionParameters cPar, unsigned long long srcSize, size_t dictSize); =20 +/*! ZSTD_CCtx_setCParams() : + * Set all parameters provided within @p cparams into the working @p cctx. + * Note : if modifying parameters during compression (MT mode only), + * note that changes to the .windowLog parameter will be ignored. + * @return 0 on success, or an error code (can be checked with ZSTD_isErro= r()). + * On failure, no parameters are updated. + */ +ZSTDLIB_STATIC_API size_t ZSTD_CCtx_setCParams(ZSTD_CCtx* cctx, ZSTD_compr= essionParameters cparams); + +/*! ZSTD_CCtx_setFParams() : + * Set all parameters provided within @p fparams into the working @p cctx. + * @return 0 on success, or an error code (can be checked with ZSTD_isErro= r()). + */ +ZSTDLIB_STATIC_API size_t ZSTD_CCtx_setFParams(ZSTD_CCtx* cctx, ZSTD_frame= Parameters fparams); + +/*! ZSTD_CCtx_setParams() : + * Set all parameters provided within @p params into the working @p cctx. + * @return 0 on success, or an error code (can be checked with ZSTD_isErro= r()). + */ +ZSTDLIB_STATIC_API size_t ZSTD_CCtx_setParams(ZSTD_CCtx* cctx, ZSTD_parame= ters params); + /*! ZSTD_compress_advanced() : * Note : this function is now DEPRECATED. * It can be replaced by ZSTD_compress2(), in combination with ZST= D_CCtx_setParameter() and other parameter setters. * This prototype will generate compilation warnings. */ ZSTD_DEPRECATED("use ZSTD_compress2") +ZSTDLIB_STATIC_API size_t ZSTD_compress_advanced(ZSTD_CCtx* cctx, - void* dst, size_t dstCapacity, - const void* src, size_t srcSize, - const void* dict,size_t dictSize, - ZSTD_parameters params); + void* dst, size_t dstCapacity, + const void* src, size_t srcSize, + const void* dict,size_t dictSize, + ZSTD_parameters params); =20 /*! ZSTD_compress_usingCDict_advanced() : * Note : this function is now DEPRECATED. * It can be replaced by ZSTD_compress2(), in combination with ZST= D_CCtx_loadDictionary() and other parameter setters. * This prototype will generate compilation warnings. */ ZSTD_DEPRECATED("use ZSTD_compress2 with ZSTD_CCtx_loadDictionary") +ZSTDLIB_STATIC_API size_t ZSTD_compress_usingCDict_advanced(ZSTD_CCtx* cctx, void* dst, size_t dstCapacit= y, const void* src, size_t srcSize, @@ -1725,7 +2025,7 @@ ZSTDLIB_STATIC_API size_t ZSTD_CCtx_refPrefix_advance= d(ZSTD_CCtx* cctx, const vo * See the comments on that enum for an explanation of the feature. */ #define ZSTD_c_forceAttachDict ZSTD_c_experimentalParam4 =20 -/* Controlled with ZSTD_paramSwitch_e enum. +/* Controlled with ZSTD_ParamSwitch_e enum. * Default is ZSTD_ps_auto. * Set to ZSTD_ps_disable to never compress literals. * Set to ZSTD_ps_enable to always compress literals. (Note: uncompressed = literals @@ -1737,11 +2037,6 @@ ZSTDLIB_STATIC_API size_t ZSTD_CCtx_refPrefix_advanc= ed(ZSTD_CCtx* cctx, const vo */ #define ZSTD_c_literalCompressionMode ZSTD_c_experimentalParam5 =20 -/* Tries to fit compressed block size to be around targetCBlockSize. - * No target when targetCBlockSize =3D=3D 0. - * There is no guarantee on compressed block size (default:0) */ -#define ZSTD_c_targetCBlockSize ZSTD_c_experimentalParam6 - /* User's best guess of source size. * Hint is not valid when srcSizeHint =3D=3D 0. * There is no guarantee that hint is close to actual source size, @@ -1808,13 +2103,16 @@ ZSTDLIB_STATIC_API size_t ZSTD_CCtx_refPrefix_advan= ced(ZSTD_CCtx* cctx, const vo * Experimental parameter. * Default is 0 =3D=3D disabled. Set to 1 to enable. * - * Tells the compressor that the ZSTD_inBuffer will ALWAYS be the same - * between calls, except for the modifications that zstd makes to pos (the - * caller must not modify pos). This is checked by the compressor, and - * compression will fail if it ever changes. This means the only flush - * mode that makes sense is ZSTD_e_end, so zstd will error if ZSTD_e_end - * is not used. The data in the ZSTD_inBuffer in the range [src, src + pos) - * MUST not be modified during compression or you will get data corruption. + * Tells the compressor that input data presented with ZSTD_inBuffer + * will ALWAYS be the same between calls. + * Technically, the @src pointer must never be changed, + * and the @pos field can only be updated by zstd. + * However, it's possible to increase the @size field, + * allowing scenarios where more data can be appended after compressions s= tarts. + * These conditions are checked by the compressor, + * and compression will fail if they are not respected. + * Also, data in the ZSTD_inBuffer within the range [src, src + pos) + * MUST not be modified during compression or it will result in data corru= ption. * * When this flag is enabled zstd won't allocate an input window buffer, * because the user guarantees it can reference the ZSTD_inBuffer until @@ -1822,18 +2120,15 @@ ZSTDLIB_STATIC_API size_t ZSTD_CCtx_refPrefix_advan= ced(ZSTD_CCtx* cctx, const vo * large enough to fit a block (see ZSTD_c_stableOutBuffer). This will also * avoid the memcpy() from the input buffer to the input window buffer. * - * NOTE: ZSTD_compressStream2() will error if ZSTD_e_end is not used. - * That means this flag cannot be used with ZSTD_compressStream(). - * * NOTE: So long as the ZSTD_inBuffer always points to valid memory, using * this flag is ALWAYS memory safe, and will never access out-of-bounds - * memory. However, compression WILL fail if you violate the preconditions. + * memory. However, compression WILL fail if conditions are not respected. * - * WARNING: The data in the ZSTD_inBuffer in the range [dst, dst + pos) MU= ST - * not be modified during compression or you will get data corruption. This - * is because zstd needs to reference data in the ZSTD_inBuffer to find + * WARNING: The data in the ZSTD_inBuffer in the range [src, src + pos) MU= ST + * not be modified during compression or it will result in data corruption. + * This is because zstd needs to reference data in the ZSTD_inBuffer to fi= nd * matches. Normally zstd maintains its own window buffer for this purpose, - * but passing this flag tells zstd to use the user provided buffer. + * but passing this flag tells zstd to rely on user provided buffer instea= d. */ #define ZSTD_c_stableInBuffer ZSTD_c_experimentalParam9 =20 @@ -1871,22 +2166,46 @@ ZSTDLIB_STATIC_API size_t ZSTD_CCtx_refPrefix_advan= ced(ZSTD_CCtx* cctx, const vo /* ZSTD_c_validateSequences * Default is 0 =3D=3D disabled. Set to 1 to enable sequence validation. * - * For use with sequence compression API: ZSTD_compressSequences(). - * Designates whether or not we validate sequences provided to ZSTD_compre= ssSequences() + * For use with sequence compression API: ZSTD_compressSequences*(). + * Designates whether or not provided sequences are validated within ZSTD_= compressSequences*() * during function execution. * - * Without validation, providing a sequence that does not conform to the z= std spec will cause - * undefined behavior, and may produce a corrupted block. + * When Sequence validation is disabled (default), Sequences are compresse= d as-is, + * so they must correct, otherwise it would result in a corruption error. * - * With validation enabled, a if sequence is invalid (see doc/zstd_compres= sion_format.md for + * Sequence validation adds some protection, by ensuring that all values r= espect boundary conditions. + * If a Sequence is detected invalid (see doc/zstd_compression_format.md f= or * specifics regarding offset/matchlength requirements) then the function = will bail out and * return an error. - * */ #define ZSTD_c_validateSequences ZSTD_c_experimentalParam12 =20 -/* ZSTD_c_useBlockSplitter - * Controlled with ZSTD_paramSwitch_e enum. +/* ZSTD_c_blockSplitterLevel + * note: this parameter only influences the first splitter stage, + * which is active before producing the sequences. + * ZSTD_c_splitAfterSequences controls the next splitter stage, + * which is active after sequence production. + * Note that both can be combined. + * Allowed values are between 0 and ZSTD_BLOCKSPLITTER_LEVEL_MAX included. + * 0 means "auto", which will select a value depending on current ZSTD_c_s= trategy. + * 1 means no splitting. + * Then, values from 2 to 6 are sorted in increasing cpu load order. + * + * Note that currently the first block is never split, + * to ensure expansion guarantees in presence of incompressible data. + */ +#define ZSTD_BLOCKSPLITTER_LEVEL_MAX 6 +#define ZSTD_c_blockSplitterLevel ZSTD_c_experimentalParam20 + +/* ZSTD_c_splitAfterSequences + * This is a stronger splitter algorithm, + * based on actual sequences previously produced by the selected parser. + * It's also slower, and as a consequence, mostly used for high compressio= n levels. + * While the post-splitter does overlap with the pre-splitter, + * both can nonetheless be combined, + * notably with ZSTD_c_blockSplitterLevel at ZSTD_BLOCKSPLITTER_LEVEL_MAX, + * resulting in higher compression ratio than just one of them. + * * Default is ZSTD_ps_auto. * Set to ZSTD_ps_disable to never use block splitter. * Set to ZSTD_ps_enable to always use block splitter. @@ -1894,10 +2213,10 @@ ZSTDLIB_STATIC_API size_t ZSTD_CCtx_refPrefix_advan= ced(ZSTD_CCtx* cctx, const vo * By default, in ZSTD_ps_auto, the library will decide at runtime whether= to use * block splitting based on the compression parameters. */ -#define ZSTD_c_useBlockSplitter ZSTD_c_experimentalParam13 +#define ZSTD_c_splitAfterSequences ZSTD_c_experimentalParam13 =20 /* ZSTD_c_useRowMatchFinder - * Controlled with ZSTD_paramSwitch_e enum. + * Controlled with ZSTD_ParamSwitch_e enum. * Default is ZSTD_ps_auto. * Set to ZSTD_ps_disable to never use row-based matchfinder. * Set to ZSTD_ps_enable to force usage of row-based matchfinder. @@ -1928,6 +2247,80 @@ ZSTDLIB_STATIC_API size_t ZSTD_CCtx_refPrefix_advanc= ed(ZSTD_CCtx* cctx, const vo */ #define ZSTD_c_deterministicRefPrefix ZSTD_c_experimentalParam15 =20 +/* ZSTD_c_prefetchCDictTables + * Controlled with ZSTD_ParamSwitch_e enum. Default is ZSTD_ps_auto. + * + * In some situations, zstd uses CDict tables in-place rather than copying= them + * into the working context. (See docs on ZSTD_dictAttachPref_e above for = details). + * In such situations, compression speed is seriously impacted when CDict = tables are + * "cold" (outside CPU cache). This parameter instructs zstd to prefetch C= Dict tables + * when they are used in-place. + * + * For sufficiently small inputs, the cost of the prefetch will outweigh t= he benefit. + * For sufficiently large inputs, zstd will by default memcpy() CDict tabl= es + * into the working context, so there is no need to prefetch. This paramet= er is + * targeted at a middle range of input sizes, where a prefetch is cheap en= ough to be + * useful but memcpy() is too expensive. The exact range of input sizes wh= ere this + * makes sense is best determined by careful experimentation. + * + * Note: for this parameter, ZSTD_ps_auto is currently equivalent to ZSTD_= ps_disable, + * but in the future zstd may conditionally enable this feature via an aut= o-detection + * heuristic for cold CDicts. + * Use ZSTD_ps_disable to opt out of prefetching under any circumstances. + */ +#define ZSTD_c_prefetchCDictTables ZSTD_c_experimentalParam16 + +/* ZSTD_c_enableSeqProducerFallback + * Allowed values are 0 (disable) and 1 (enable). The default setting is 0. + * + * Controls whether zstd will fall back to an internal sequence producer i= f an + * external sequence producer is registered and returns an error code. Thi= s fallback + * is block-by-block: the internal sequence producer will only be called f= or blocks + * where the external sequence producer returns an error code. Fallback pa= rsing will + * follow any other cParam settings, such as compression level, the same a= s in a + * normal (fully-internal) compression operation. + * + * The user is strongly encouraged to read the full Block-Level Sequence P= roducer API + * documentation (below) before setting this parameter. */ +#define ZSTD_c_enableSeqProducerFallback ZSTD_c_experimentalParam17 + +/* ZSTD_c_maxBlockSize + * Allowed values are between 1KB and ZSTD_BLOCKSIZE_MAX (128KB). + * The default is ZSTD_BLOCKSIZE_MAX, and setting to 0 will set to the def= ault. + * + * This parameter can be used to set an upper bound on the blocksize + * that overrides the default ZSTD_BLOCKSIZE_MAX. It cannot be used to set= upper + * bounds greater than ZSTD_BLOCKSIZE_MAX or bounds lower than 1KB (will m= ake + * compressBound() inaccurate). Only currently meant to be used for testin= g. + */ +#define ZSTD_c_maxBlockSize ZSTD_c_experimentalParam18 + +/* ZSTD_c_repcodeResolution + * This parameter only has an effect if ZSTD_c_blockDelimiters is + * set to ZSTD_sf_explicitBlockDelimiters (may change in the future). + * + * This parameter affects how zstd parses external sequences, + * provided via the ZSTD_compressSequences*() API + * or from an external block-level sequence producer. + * + * If set to ZSTD_ps_enable, the library will check for repeated offsets w= ithin + * external sequences, even if those repcodes are not explicitly indicated= in + * the "rep" field. Note that this is the only way to exploit repcode matc= hes + * while using compressSequences*() or an external sequence producer, sinc= e zstd + * currently ignores the "rep" field of external sequences. + * + * If set to ZSTD_ps_disable, the library will not exploit repeated offset= s in + * external sequences, regardless of whether the "rep" field has been set.= This + * reduces sequence compression overhead by about 25% while sacrificing so= me + * compression ratio. + * + * The default value is ZSTD_ps_auto, for which the library will enable/di= sable + * based on compression level (currently: level<10 disables, level>=3D10 e= nables). + */ +#define ZSTD_c_repcodeResolution ZSTD_c_experimentalParam19 +#define ZSTD_c_searchForExternalRepcodes ZSTD_c_experimentalParam19 /* old= er name */ + + /*! ZSTD_CCtx_getParameter() : * Get the requested compression parameter value, selected by enum ZSTD_c= Parameter, * and store it into int* value. @@ -2084,7 +2477,7 @@ ZSTDLIB_STATIC_API size_t ZSTD_DCtx_getParameter(ZSTD= _DCtx* dctx, ZSTD_dParamete * in the range [dst, dst + pos) MUST not be modified during decompression * or you will get data corruption. * - * When this flags is enabled zstd won't allocate an output buffer, because + * When this flag is enabled zstd won't allocate an output buffer, because * it can write directly to the ZSTD_outBuffer, but it will still allocate * an input buffer large enough to fit any compressed block. This will also * avoid the memcpy() from the internal output buffer to the ZSTD_outBuffe= r. @@ -2137,6 +2530,33 @@ ZSTDLIB_STATIC_API size_t ZSTD_DCtx_getParameter(ZST= D_DCtx* dctx, ZSTD_dParamete */ #define ZSTD_d_refMultipleDDicts ZSTD_d_experimentalParam4 =20 +/* ZSTD_d_disableHuffmanAssembly + * Set to 1 to disable the Huffman assembly implementation. + * The default value is 0, which allows zstd to use the Huffman assembly + * implementation if available. + * + * This parameter can be used to disable Huffman assembly at runtime. + * If you want to disable it at compile time you can define the macro + * ZSTD_DISABLE_ASM. + */ +#define ZSTD_d_disableHuffmanAssembly ZSTD_d_experimentalParam5 + +/* ZSTD_d_maxBlockSize + * Allowed values are between 1KB and ZSTD_BLOCKSIZE_MAX (128KB). + * The default is ZSTD_BLOCKSIZE_MAX, and setting to 0 will set to the def= ault. + * + * Forces the decompressor to reject blocks whose content size is + * larger than the configured maxBlockSize. When maxBlockSize is + * larger than the windowSize, the windowSize is used instead. + * This saves memory on the decoder when you know all blocks are small. + * + * This option is typically used in conjunction with ZSTD_c_maxBlockSize. + * + * WARNING: This causes the decoder to reject otherwise valid frames + * that have block sizes larger than the configured maxBlockSize. + */ +#define ZSTD_d_maxBlockSize ZSTD_d_experimentalParam6 + =20 /*! ZSTD_DCtx_setFormat() : * This function is REDUNDANT. Prefer ZSTD_DCtx_setParameter(). @@ -2145,6 +2565,7 @@ ZSTDLIB_STATIC_API size_t ZSTD_DCtx_getParameter(ZSTD= _DCtx* dctx, ZSTD_dParamete * such ZSTD_f_zstd1_magicless for example. * @return : 0, or an error code (which can be tested using ZSTD_isError()= ). */ ZSTD_DEPRECATED("use ZSTD_DCtx_setParameter() instead") +ZSTDLIB_STATIC_API size_t ZSTD_DCtx_setFormat(ZSTD_DCtx* dctx, ZSTD_format_e format); =20 /*! ZSTD_decompressStream_simpleArgs() : @@ -2181,6 +2602,7 @@ ZSTDLIB_STATIC_API size_t ZSTD_decompressStream_simpl= eArgs ( * This prototype will generate compilation warnings. */ ZSTD_DEPRECATED("use ZSTD_CCtx_reset, see zstd.h for detailed instructions= ") +ZSTDLIB_STATIC_API size_t ZSTD_initCStream_srcSize(ZSTD_CStream* zcs, int compressionLevel, unsigned long long pledgedSrcSize); @@ -2198,17 +2620,15 @@ size_t ZSTD_initCStream_srcSize(ZSTD_CStream* zcs, * This prototype will generate compilation warnings. */ ZSTD_DEPRECATED("use ZSTD_CCtx_reset, see zstd.h for detailed instructions= ") +ZSTDLIB_STATIC_API size_t ZSTD_initCStream_usingDict(ZSTD_CStream* zcs, const void* dict, size_t dictSize, int compressionLevel); =20 /*! ZSTD_initCStream_advanced() : - * This function is DEPRECATED, and is approximately equivalent to: + * This function is DEPRECATED, and is equivalent to: * ZSTD_CCtx_reset(zcs, ZSTD_reset_session_only); - * // Pseudocode: Set each zstd parameter and leave the rest as-is. - * for ((param, value) : params) { - * ZSTD_CCtx_setParameter(zcs, param, value); - * } + * ZSTD_CCtx_setParams(zcs, params); * ZSTD_CCtx_setPledgedSrcSize(zcs, pledgedSrcSize); * ZSTD_CCtx_loadDictionary(zcs, dict, dictSize); * @@ -2218,6 +2638,7 @@ size_t ZSTD_initCStream_usingDict(ZSTD_CStream* zcs, * This prototype will generate compilation warnings. */ ZSTD_DEPRECATED("use ZSTD_CCtx_reset, see zstd.h for detailed instructions= ") +ZSTDLIB_STATIC_API size_t ZSTD_initCStream_advanced(ZSTD_CStream* zcs, const void* dict, size_t dictSize, ZSTD_parameters params, @@ -2232,15 +2653,13 @@ size_t ZSTD_initCStream_advanced(ZSTD_CStream* zcs, * This prototype will generate compilation warnings. */ ZSTD_DEPRECATED("use ZSTD_CCtx_reset and ZSTD_CCtx_refCDict, see zstd.h fo= r detailed instructions") +ZSTDLIB_STATIC_API size_t ZSTD_initCStream_usingCDict(ZSTD_CStream* zcs, const ZSTD_CDict* cd= ict); =20 /*! ZSTD_initCStream_usingCDict_advanced() : - * This function is DEPRECATED, and is approximately equivalent to: + * This function is DEPRECATED, and is equivalent to: * ZSTD_CCtx_reset(zcs, ZSTD_reset_session_only); - * // Pseudocode: Set each zstd frame parameter and leave the rest as-= is. - * for ((fParam, value) : fParams) { - * ZSTD_CCtx_setParameter(zcs, fParam, value); - * } + * ZSTD_CCtx_setFParams(zcs, fParams); * ZSTD_CCtx_setPledgedSrcSize(zcs, pledgedSrcSize); * ZSTD_CCtx_refCDict(zcs, cdict); * @@ -2250,6 +2669,7 @@ size_t ZSTD_initCStream_usingCDict(ZSTD_CStream* zcs,= const ZSTD_CDict* cdict); * This prototype will generate compilation warnings. */ ZSTD_DEPRECATED("use ZSTD_CCtx_reset and ZSTD_CCtx_refCDict, see zstd.h fo= r detailed instructions") +ZSTDLIB_STATIC_API size_t ZSTD_initCStream_usingCDict_advanced(ZSTD_CStream* zcs, const ZSTD_CDict* cdict, ZSTD_frameParameters fParams, @@ -2264,7 +2684,7 @@ size_t ZSTD_initCStream_usingCDict_advanced(ZSTD_CStr= eam* zcs, * explicitly specified. * * start a new frame, using same parameters from previous frame. - * This is typically useful to skip dictionary loading stage, since it wi= ll re-use it in-place. + * This is typically useful to skip dictionary loading stage, since it wi= ll reuse it in-place. * Note that zcs must be init at least once before using ZSTD_resetCStrea= m(). * If pledgedSrcSize is not known at reset time, use macro ZSTD_CONTENTSI= ZE_UNKNOWN. * If pledgedSrcSize > 0, its value must be correct, as it will be writte= n in header, and controlled at the end. @@ -2274,6 +2694,7 @@ size_t ZSTD_initCStream_usingCDict_advanced(ZSTD_CStr= eam* zcs, * This prototype will generate compilation warnings. */ ZSTD_DEPRECATED("use ZSTD_CCtx_reset, see zstd.h for detailed instructions= ") +ZSTDLIB_STATIC_API size_t ZSTD_resetCStream(ZSTD_CStream* zcs, unsigned long long pledgedSrcS= ize); =20 =20 @@ -2319,8 +2740,8 @@ ZSTDLIB_STATIC_API size_t ZSTD_toFlushNow(ZSTD_CCtx* = cctx); * ZSTD_DCtx_loadDictionary(zds, dict, dictSize); * * note: no dictionary will be used if dict =3D=3D NULL or dictSize < 8 - * Note : this prototype will be marked as deprecated and generate compila= tion warnings on reaching v1.5.x */ +ZSTD_DEPRECATED("use ZSTD_DCtx_reset + ZSTD_DCtx_loadDictionary, see zstd.= h for detailed instructions") ZSTDLIB_STATIC_API size_t ZSTD_initDStream_usingDict(ZSTD_DStream* zds, co= nst void* dict, size_t dictSize); =20 /*! @@ -2330,8 +2751,8 @@ ZSTDLIB_STATIC_API size_t ZSTD_initDStream_usingDict(= ZSTD_DStream* zds, const vo * ZSTD_DCtx_refDDict(zds, ddict); * * note : ddict is referenced, it must outlive decompression session - * Note : this prototype will be marked as deprecated and generate compila= tion warnings on reaching v1.5.x */ +ZSTD_DEPRECATED("use ZSTD_DCtx_reset + ZSTD_DCtx_refDDict, see zstd.h for = detailed instructions") ZSTDLIB_STATIC_API size_t ZSTD_initDStream_usingDDict(ZSTD_DStream* zds, c= onst ZSTD_DDict* ddict); =20 /*! @@ -2339,18 +2760,202 @@ ZSTDLIB_STATIC_API size_t ZSTD_initDStream_usingDD= ict(ZSTD_DStream* zds, const Z * * ZSTD_DCtx_reset(zds, ZSTD_reset_session_only); * - * re-use decompression parameters from previous init; saves dictionary lo= ading - * Note : this prototype will be marked as deprecated and generate compila= tion warnings on reaching v1.5.x + * reuse decompression parameters from previous init; saves dictionary loa= ding */ +ZSTD_DEPRECATED("use ZSTD_DCtx_reset, see zstd.h for detailed instructions= ") ZSTDLIB_STATIC_API size_t ZSTD_resetDStream(ZSTD_DStream* zds); =20 =20 +/* ********************* BLOCK-LEVEL SEQUENCE PRODUCER API ***************= ****** + * + * *** OVERVIEW *** + * The Block-Level Sequence Producer API allows users to provide their own= custom + * sequence producer which libzstd invokes to process each block. The prod= uced list + * of sequences (literals and matches) is then post-processed by libzstd t= o produce + * valid compressed blocks. + * + * This block-level offload API is a more granular complement of the exist= ing + * frame-level offload API compressSequences() (introduced in v1.5.1). It = offers + * an easier migration story for applications already integrated with libz= std: the + * user application continues to invoke the same compression functions + * ZSTD_compress2() or ZSTD_compressStream2() as usual, and transparently = benefits + * from the specific advantages of the external sequence producer. For exa= mple, + * the sequence producer could be tuned to take advantage of known charact= eristics + * of the input, to offer better speed / ratio, or could leverage hardware + * acceleration not available within libzstd itself. + * + * See contrib/externalSequenceProducer for an example program employing t= he + * Block-Level Sequence Producer API. + * + * *** USAGE *** + * The user is responsible for implementing a function of type + * ZSTD_sequenceProducer_F. For each block, zstd will pass the following + * arguments to the user-provided function: + * + * - sequenceProducerState: a pointer to a user-managed state for the se= quence + * producer. + * + * - outSeqs, outSeqsCapacity: an output buffer for the sequence produce= r. + * outSeqsCapacity is guaranteed >=3D ZSTD_sequenceBound(srcSize). The= memory + * backing outSeqs is managed by the CCtx. + * + * - src, srcSize: an input buffer for the sequence producer to parse. + * srcSize is guaranteed to be <=3D ZSTD_BLOCKSIZE_MAX. + * + * - dict, dictSize: a history buffer, which may be empty, which the seq= uence + * producer may reference as it parses the src buffer. Currently, zstd= will + * always pass dictSize =3D=3D 0 into external sequence producers, but= this will + * change in the future. + * + * - compressionLevel: a signed integer representing the zstd compressio= n level + * set by the user for the current operation. The sequence producer ma= y choose + * to use this information to change its compression strategy and spee= d/ratio + * tradeoff. Note: the compression level does not reflect zstd paramet= ers set + * through the advanced API. + * + * - windowSize: a size_t representing the maximum allowed offset for ex= ternal + * sequences. Note that sequence offsets are sometimes allowed to exce= ed the + * windowSize if a dictionary is present, see doc/zstd_compression_for= mat.md + * for details. + * + * The user-provided function shall return a size_t representing the numbe= r of + * sequences written to outSeqs. This return value will be treated as an e= rror + * code if it is greater than outSeqsCapacity. The return value must be no= n-zero + * if srcSize is non-zero. The ZSTD_SEQUENCE_PRODUCER_ERROR macro is provi= ded + * for convenience, but any value greater than outSeqsCapacity will be tre= ated as + * an error code. + * + * If the user-provided function does not return an error code, the sequen= ces + * written to outSeqs must be a valid parse of the src buffer. Data corrup= tion may + * occur if the parse is not valid. A parse is defined to be valid if the + * following conditions hold: + * - The sum of matchLengths and literalLengths must equal srcSize. + * - All sequences in the parse, except for the final sequence, must have + * matchLength >=3D ZSTD_MINMATCH_MIN. The final sequence must have + * matchLength >=3D ZSTD_MINMATCH_MIN or matchLength =3D=3D 0. + * - All offsets must respect the windowSize parameter as specified in + * doc/zstd_compression_format.md. + * - If the final sequence has matchLength =3D=3D 0, it must also have o= ffset =3D=3D 0. + * + * zstd will only validate these conditions (and fail compression if they = do not + * hold) if the ZSTD_c_validateSequences cParam is enabled. Note that sequ= ence + * validation has a performance cost. + * + * If the user-provided function returns an error, zstd will either fall b= ack + * to an internal sequence producer or fail the compression operation. The= user can + * choose between the two behaviors by setting the ZSTD_c_enableSeqProduce= rFallback + * cParam. Fallback compression will follow any other cParam settings, suc= h as + * compression level, the same as in a normal compression operation. + * + * The user shall instruct zstd to use a particular ZSTD_sequenceProducer_F + * function by calling + * ZSTD_registerSequenceProducer(cctx, + * sequenceProducerState, + * sequenceProducer) + * This setting will persist until the next parameter reset of the CCtx. + * + * The sequenceProducerState must be initialized by the user before calling + * ZSTD_registerSequenceProducer(). The user is responsible for destroying= the + * sequenceProducerState. + * + * *** LIMITATIONS *** + * This API is compatible with all zstd compression APIs which respect adv= anced parameters. + * However, there are three limitations: + * + * First, the ZSTD_c_enableLongDistanceMatching cParam is not currently su= pported. + * COMPRESSION WILL FAIL if it is enabled and the user tries to compress w= ith a block-level + * external sequence producer. + * - Note that ZSTD_c_enableLongDistanceMatching is auto-enabled by defa= ult in some + * cases (see its documentation for details). Users must explicitly set + * ZSTD_c_enableLongDistanceMatching to ZSTD_ps_disable in such cases = if an external + * sequence producer is registered. + * - As of this writing, ZSTD_c_enableLongDistanceMatching is disabled b= y default + * whenever ZSTD_c_windowLog < 128MB, but that's subject to change. Us= ers should + * check the docs on ZSTD_c_enableLongDistanceMatching whenever the Bl= ock-Level Sequence + * Producer API is used in conjunction with advanced settings (like ZS= TD_c_windowLog). + * + * Second, history buffers are not currently supported. Concretely, zstd w= ill always pass + * dictSize =3D=3D 0 to the external sequence producer (for now). This has= two implications: + * - Dictionaries are not currently supported. Compression will *not* fa= il if the user + * references a dictionary, but the dictionary won't have any effect. + * - Stream history is not currently supported. All advanced compression= APIs, including + * streaming APIs, work with external sequence producers, but each blo= ck is treated as + * an independent chunk without history from previous blocks. + * + * Third, multi-threading within a single compression is not currently sup= ported. In other words, + * COMPRESSION WILL FAIL if ZSTD_c_nbWorkers > 0 and an external sequence = producer is registered. + * Multi-threading across compressions is fine: simply create one CCtx per= thread. + * + * Long-term, we plan to overcome all three limitations. There is no techn= ical blocker to + * overcoming them. It is purely a question of engineering effort. + */ + +#define ZSTD_SEQUENCE_PRODUCER_ERROR ((size_t)(-1)) + +typedef size_t (*ZSTD_sequenceProducer_F) ( + void* sequenceProducerState, + ZSTD_Sequence* outSeqs, size_t outSeqsCapacity, + const void* src, size_t srcSize, + const void* dict, size_t dictSize, + int compressionLevel, + size_t windowSize +); + +/*! ZSTD_registerSequenceProducer() : + * Instruct zstd to use a block-level external sequence producer function. + * + * The sequenceProducerState must be initialized by the caller, and the ca= ller is + * responsible for managing its lifetime. This parameter is sticky across + * compressions. It will remain set until the user explicitly resets compr= ession + * parameters. + * + * Sequence producer registration is considered to be an "advanced paramet= er", + * part of the "advanced API". This means it will only have an effect on c= ompression + * APIs which respect advanced parameters, such as compress2() and compres= sStream2(). + * Older compression APIs such as compressCCtx(), which predate the introd= uction of + * "advanced parameters", will ignore any external sequence producer setti= ng. + * + * The sequence producer can be "cleared" by registering a NULL function p= ointer. This + * removes all limitations described above in the "LIMITATIONS" section of= the API docs. + * + * The user is strongly encouraged to read the full API documentation (abo= ve) before + * calling this function. */ +ZSTDLIB_STATIC_API void +ZSTD_registerSequenceProducer( + ZSTD_CCtx* cctx, + void* sequenceProducerState, + ZSTD_sequenceProducer_F sequenceProducer +); + +/*! ZSTD_CCtxParams_registerSequenceProducer() : + * Same as ZSTD_registerSequenceProducer(), but operates on ZSTD_CCtx_para= ms. + * This is used for accurate size estimation with ZSTD_estimateCCtxSize_us= ingCCtxParams(), + * which is needed when creating a ZSTD_CCtx with ZSTD_initStaticCCtx(). + * + * If you are using the external sequence producer API in a scenario where= ZSTD_initStaticCCtx() + * is required, then this function is for you. Otherwise, you probably don= 't need it. + * + * See tests/zstreamtest.c for example usage. */ +ZSTDLIB_STATIC_API void +ZSTD_CCtxParams_registerSequenceProducer( + ZSTD_CCtx_params* params, + void* sequenceProducerState, + ZSTD_sequenceProducer_F sequenceProducer +); + + /* ******************************************************************* -* Buffer-less and synchronous inner streaming functions +* Buffer-less and synchronous inner streaming functions (DEPRECATED) +* +* This API is deprecated, and will be removed in a future version. +* It allows streaming (de)compression with user allocated buffers. +* However, it is hard to use, and not as well tested as the rest of +* our API. * -* This is an advanced API, giving full control over buffer management, fo= r users which need direct control over memory. -* But it's also a complex one, with several restrictions, documented belo= w. -* Prefer normal streaming API for an easier experience. +* Please use the normal streaming API instead: ZSTD_compressStream2, +* and ZSTD_decompressStream. +* If there is functionality that you need, but it doesn't provide, +* please open an issue on our GitHub. ********************************************************************* */ =20 /* @@ -2358,11 +2963,10 @@ ZSTDLIB_STATIC_API size_t ZSTD_resetDStream(ZSTD_DS= tream* zds); =20 A ZSTD_CCtx object is required to track streaming operations. Use ZSTD_createCCtx() / ZSTD_freeCCtx() to manage resource. - ZSTD_CCtx object can be re-used multiple times within successive compres= sion operations. + ZSTD_CCtx object can be reused multiple times within successive compress= ion operations. =20 Start by initializing a context. Use ZSTD_compressBegin(), or ZSTD_compressBegin_usingDict() for dictiona= ry compression. - It's also possible to duplicate a reference context which has already be= en initialized, using ZSTD_copyCCtx() =20 Then, consume your input using ZSTD_compressContinue(). There are some important considerations to keep in mind when using this = advanced function : @@ -2380,39 +2984,49 @@ ZSTDLIB_STATIC_API size_t ZSTD_resetDStream(ZSTD_DS= tream* zds); It's possible to use srcSize=3D=3D0, in which case, it will write a fina= l empty block to end the frame. Without last block mark, frames are considered unfinished (hence corrupt= ed) by compliant decoders. =20 - `ZSTD_CCtx` object can be re-used (ZSTD_compressBegin()) to compress aga= in. + `ZSTD_CCtx` object can be reused (ZSTD_compressBegin()) to compress agai= n. */ =20 /*=3D=3D=3D=3D=3D Buffer-less streaming compression functions =3D=3D=3D= =3D=3D*/ +ZSTD_DEPRECATED("The buffer-less API is deprecated in favor of the normal = streaming API. See docs.") ZSTDLIB_STATIC_API size_t ZSTD_compressBegin(ZSTD_CCtx* cctx, int compress= ionLevel); +ZSTD_DEPRECATED("The buffer-less API is deprecated in favor of the normal = streaming API. See docs.") ZSTDLIB_STATIC_API size_t ZSTD_compressBegin_usingDict(ZSTD_CCtx* cctx, co= nst void* dict, size_t dictSize, int compressionLevel); +ZSTD_DEPRECATED("The buffer-less API is deprecated in favor of the normal = streaming API. See docs.") ZSTDLIB_STATIC_API size_t ZSTD_compressBegin_usingCDict(ZSTD_CCtx* cctx, c= onst ZSTD_CDict* cdict); /*< note: fails if cdict=3D=3DNULL */ -ZSTDLIB_STATIC_API size_t ZSTD_copyCCtx(ZSTD_CCtx* cctx, const ZSTD_CCtx* = preparedCCtx, unsigned long long pledgedSrcSize); /*< note: if pledgedSrcS= ize is not known, use ZSTD_CONTENTSIZE_UNKNOWN */ =20 +ZSTD_DEPRECATED("This function will likely be removed in a future release.= It is misleading and has very limited utility.") +ZSTDLIB_STATIC_API +size_t ZSTD_copyCCtx(ZSTD_CCtx* cctx, const ZSTD_CCtx* preparedCCtx, unsig= ned long long pledgedSrcSize); /*< note: if pledgedSrcSize is not known, u= se ZSTD_CONTENTSIZE_UNKNOWN */ + +ZSTD_DEPRECATED("The buffer-less API is deprecated in favor of the normal = streaming API. See docs.") ZSTDLIB_STATIC_API size_t ZSTD_compressContinue(ZSTD_CCtx* cctx, void* dst= , size_t dstCapacity, const void* src, size_t srcSize); +ZSTD_DEPRECATED("The buffer-less API is deprecated in favor of the normal = streaming API. See docs.") ZSTDLIB_STATIC_API size_t ZSTD_compressEnd(ZSTD_CCtx* cctx, void* dst, siz= e_t dstCapacity, const void* src, size_t srcSize); =20 /* The ZSTD_compressBegin_advanced() and ZSTD_compressBegin_usingCDict_adv= anced() are now DEPRECATED and will generate a compiler warning */ ZSTD_DEPRECATED("use advanced API to access custom parameters") +ZSTDLIB_STATIC_API size_t ZSTD_compressBegin_advanced(ZSTD_CCtx* cctx, const void* dict, size= _t dictSize, ZSTD_parameters params, unsigned long long pledgedSrcSize); /*= < pledgedSrcSize : If srcSize is not known at init time, use ZSTD_CONTENTSI= ZE_UNKNOWN */ ZSTD_DEPRECATED("use advanced API to access custom parameters") +ZSTDLIB_STATIC_API size_t ZSTD_compressBegin_usingCDict_advanced(ZSTD_CCtx* const cctx, const= ZSTD_CDict* const cdict, ZSTD_frameParameters const fParams, unsigned long= long const pledgedSrcSize); /* compression parameters are already set wi= thin cdict. pledgedSrcSize must be correct. If srcSize is not known, use ma= cro ZSTD_CONTENTSIZE_UNKNOWN */ /* Buffer-less streaming decompression (synchronous mode) =20 A ZSTD_DCtx object is required to track streaming operations. Use ZSTD_createDCtx() / ZSTD_freeDCtx() to manage it. - A ZSTD_DCtx object can be re-used multiple times. + A ZSTD_DCtx object can be reused multiple times. =20 First typical operation is to retrieve frame parameters, using ZSTD_getF= rameHeader(). Frame header is extracted from the beginning of compressed frame, so pro= viding only the frame's beginning is enough. Data fragment must be large enough to ensure successful decoding. `ZSTD_frameHeaderSize_max` bytes is guaranteed to always be large enough. - @result : 0 : successful decoding, the `ZSTD_frameHeader` structure is c= orrectly filled. - >0 : `srcSize` is too small, please provide at least @result by= tes on next attempt. + result : 0 : successful decoding, the `ZSTD_frameHeader` structure is c= orrectly filled. + >0 : `srcSize` is too small, please provide at least result byt= es on next attempt. errorCode, which can be tested using ZSTD_isError(). =20 - It fills a ZSTD_frameHeader structure with important information to corr= ectly decode the frame, + It fills a ZSTD_FrameHeader structure with important information to corr= ectly decode the frame, such as the dictionary ID, content size, or maximum back-reference dista= nce (`windowSize`). Note that these values could be wrong, either because of data corruption= , or because a 3rd party deliberately spoofs false information. As a consequence, check that values remain within valid application rang= e. @@ -2428,7 +3042,7 @@ size_t ZSTD_compressBegin_usingCDict_advanced(ZSTD_CC= tx* const cctx, const ZSTD_ =20 The most memory efficient way is to use a round buffer of sufficient siz= e. Sufficient size is determined by invoking ZSTD_decodingBufferSize_min(), - which can @return an error code if required value is too large for curre= nt system (in 32-bits mode). + which can return an error code if required value is too large for curren= t system (in 32-bits mode). In a round buffer methodology, ZSTD_decompressContinue() decompresses ea= ch block next to previous one, up to the moment there is not enough room left in the buffer to guarante= e decoding another full block, which maximum size is provided in `ZSTD_frameHeader` structure, field `b= lockSizeMax`. @@ -2448,7 +3062,7 @@ size_t ZSTD_compressBegin_usingCDict_advanced(ZSTD_CC= tx* const cctx, const ZSTD_ ZSTD_nextSrcSizeToDecompress() tells how many bytes to provide as 'srcSi= ze' to ZSTD_decompressContinue(). ZSTD_decompressContinue() requires this _exact_ amount of bytes, or it w= ill fail. =20 - @result of ZSTD_decompressContinue() is the number of bytes regenerated w= ithin 'dst' (necessarily <=3D dstCapacity). + result of ZSTD_decompressContinue() is the number of bytes regenerated w= ithin 'dst' (necessarily <=3D dstCapacity). It can be zero : it just means ZSTD_decompressContinue() has decoded som= e metadata item. It can also be an error code, which can be tested with ZSTD_isError(). =20 @@ -2471,27 +3085,7 @@ size_t ZSTD_compressBegin_usingCDict_advanced(ZSTD_C= Ctx* const cctx, const ZSTD_ */ =20 /*=3D=3D=3D=3D=3D Buffer-less streaming decompression functions =3D=3D= =3D=3D=3D*/ -typedef enum { ZSTD_frame, ZSTD_skippableFrame } ZSTD_frameType_e; -typedef struct { - unsigned long long frameContentSize; /* if =3D=3D ZSTD_CONTENTSIZE_UNK= NOWN, it means this field is not available. 0 means "empty" */ - unsigned long long windowSize; /* can be very large, up to <=3D = frameContentSize */ - unsigned blockSizeMax; - ZSTD_frameType_e frameType; /* if =3D=3D ZSTD_skippableFrame,= frameContentSize is the size of skippable content */ - unsigned headerSize; - unsigned dictID; - unsigned checksumFlag; -} ZSTD_frameHeader; =20 -/*! ZSTD_getFrameHeader() : - * decode Frame Header, or requires larger `srcSize`. - * @return : 0, `zfhPtr` is correctly filled, - * >0, `srcSize` is too small, value is wanted `srcSize` amount, - * or an error code, which can be tested using ZSTD_isError() */ -ZSTDLIB_STATIC_API size_t ZSTD_getFrameHeader(ZSTD_frameHeader* zfhPtr, co= nst void* src, size_t srcSize); /*< doesn't consume input */ -/*! ZSTD_getFrameHeader_advanced() : - * same as ZSTD_getFrameHeader(), - * with added capability to select a format (like ZSTD_f_zstd1_magicless)= */ -ZSTDLIB_STATIC_API size_t ZSTD_getFrameHeader_advanced(ZSTD_frameHeader* z= fhPtr, const void* src, size_t srcSize, ZSTD_format_e format); ZSTDLIB_STATIC_API size_t ZSTD_decodingBufferSize_min(unsigned long long w= indowSize, unsigned long long frameContentSize); /*< when frame content si= ze is not known, pass in frameContentSize =3D=3D ZSTD_CONTENTSIZE_UNKNOWN */ =20 ZSTDLIB_STATIC_API size_t ZSTD_decompressBegin(ZSTD_DCtx* dctx); @@ -2502,6 +3096,7 @@ ZSTDLIB_STATIC_API size_t ZSTD_nextSrcSizeToDecompres= s(ZSTD_DCtx* dctx); ZSTDLIB_STATIC_API size_t ZSTD_decompressContinue(ZSTD_DCtx* dctx, void* d= st, size_t dstCapacity, const void* src, size_t srcSize); =20 /* misc */ +ZSTD_DEPRECATED("This function will likely be removed in the next minor re= lease. It is misleading and has very limited utility.") ZSTDLIB_STATIC_API void ZSTD_copyDCtx(ZSTD_DCtx* dctx, const ZSTD_DCtx* = preparedDCtx); typedef enum { ZSTDnit_frameHeader, ZSTDnit_blockHeader, ZSTDnit_block, ZS= TDnit_lastBlock, ZSTDnit_checksum, ZSTDnit_skippableFrame } ZSTD_nextInputT= ype_e; ZSTDLIB_STATIC_API ZSTD_nextInputType_e ZSTD_nextInputType(ZSTD_DCtx* dctx= ); @@ -2509,11 +3104,23 @@ ZSTDLIB_STATIC_API ZSTD_nextInputType_e ZSTD_nextIn= putType(ZSTD_DCtx* dctx); =20 =20 =20 -/* =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D */ -/* Block level API */ -/* =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D */ +/* =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D */ +/* Block level API (DEPRECATED) */ +/* =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D */ =20 /*! + + This API is deprecated in favor of the regular compression API. + You can get the frame header down to 2 bytes by setting: + - ZSTD_c_format =3D ZSTD_f_zstd1_magicless + - ZSTD_c_contentSizeFlag =3D 0 + - ZSTD_c_checksumFlag =3D 0 + - ZSTD_c_dictIDFlag =3D 0 + + This API is not as well tested as our normal API, so we recommend not = using it. + We will be removing it in a future version. If the normal API doesn't = provide + the functionality you need, please open a GitHub issue. + Block functions produce and decode raw zstd blocks, without frame meta= data. Frame metadata cost is typically ~12 bytes, which can be non-negligibl= e for very small blocks (< 100 bytes). But users will have to take in charge needed metadata to regenerate da= ta, such as compressed and content sizes. @@ -2524,7 +3131,6 @@ ZSTDLIB_STATIC_API ZSTD_nextInputType_e ZSTD_nextInpu= tType(ZSTD_DCtx* dctx); - It is necessary to init context before starting + compression : any ZSTD_compressBegin*() variant, including with di= ctionary + decompression : any ZSTD_decompressBegin*() variant, including wit= h dictionary - + copyCCtx() and copyDCtx() can be used too - Block size is limited, it must be <=3D ZSTD_getBlockSize() <=3D ZSTD= _BLOCKSIZE_MAX =3D=3D 128 KB + If input is larger than a block size, it's necessary to split inpu= t data into multiple blocks + For inputs larger than a single block, consider using regular ZSTD= _compress() instead. @@ -2541,11 +3147,14 @@ ZSTDLIB_STATIC_API ZSTD_nextInputType_e ZSTD_nextIn= putType(ZSTD_DCtx* dctx); */ =20 /*=3D=3D=3D=3D=3D Raw zstd block functions =3D=3D=3D=3D=3D*/ +ZSTD_DEPRECATED("The block API is deprecated in favor of the normal compre= ssion API. See docs.") ZSTDLIB_STATIC_API size_t ZSTD_getBlockSize (const ZSTD_CCtx* cctx); +ZSTD_DEPRECATED("The block API is deprecated in favor of the normal compre= ssion API. See docs.") ZSTDLIB_STATIC_API size_t ZSTD_compressBlock (ZSTD_CCtx* cctx, void* dst,= size_t dstCapacity, const void* src, size_t srcSize); +ZSTD_DEPRECATED("The block API is deprecated in favor of the normal compre= ssion API. See docs.") ZSTDLIB_STATIC_API size_t ZSTD_decompressBlock(ZSTD_DCtx* dctx, void* dst,= size_t dstCapacity, const void* src, size_t srcSize); +ZSTD_DEPRECATED("The block API is deprecated in favor of the normal compre= ssion API. See docs.") ZSTDLIB_STATIC_API size_t ZSTD_insertBlock (ZSTD_DCtx* dctx, const void= * blockStart, size_t blockSize); /*< insert uncompressed block into `dctx`= history. Useful for multi-blocks decompression. */ =20 =20 #endif /* ZSTD_H_ZSTD_STATIC_LINKING_ONLY */ - diff --git a/lib/zstd/Makefile b/lib/zstd/Makefile index 20f08c644b71..be218b5e0ed5 100644 --- a/lib/zstd/Makefile +++ b/lib/zstd/Makefile @@ -1,6 +1,6 @@ # SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause # ################################################################ -# Copyright (c) Facebook, Inc. +# Copyright (c) Meta Platforms, Inc. and affiliates. # All rights reserved. # # This source code is licensed under both the BSD-style license (found in = the @@ -26,6 +26,7 @@ zstd_compress-y :=3D \ compress/zstd_lazy.o \ compress/zstd_ldm.o \ compress/zstd_opt.o \ + compress/zstd_preSplit.o \ =20 zstd_decompress-y :=3D \ zstd_decompress_module.o \ diff --git a/lib/zstd/common/allocations.h b/lib/zstd/common/allocations.h new file mode 100644 index 000000000000..16c3d08e8d1a --- /dev/null +++ b/lib/zstd/common/allocations.h @@ -0,0 +1,56 @@ +/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */ +/* + * Copyright (c) Meta Platforms, Inc. and affiliates. + * All rights reserved. + * + * This source code is licensed under both the BSD-style license (found in= the + * LICENSE file in the root directory of this source tree) and the GPLv2 (= found + * in the COPYING file in the root directory of this source tree). + * You may select, at your option, one of the above-listed licenses. + */ + +/* This file provides custom allocation primitives + */ + +#define ZSTD_DEPS_NEED_MALLOC +#include "zstd_deps.h" /* ZSTD_malloc, ZSTD_calloc, ZSTD_free, ZSTD_mems= et */ + +#include "compiler.h" /* MEM_STATIC */ +#define ZSTD_STATIC_LINKING_ONLY +#include /* ZSTD_customMem */ + +#ifndef ZSTD_ALLOCATIONS_H +#define ZSTD_ALLOCATIONS_H + +/* custom memory allocation functions */ + +MEM_STATIC void* ZSTD_customMalloc(size_t size, ZSTD_customMem customMem) +{ + if (customMem.customAlloc) + return customMem.customAlloc(customMem.opaque, size); + return ZSTD_malloc(size); +} + +MEM_STATIC void* ZSTD_customCalloc(size_t size, ZSTD_customMem customMem) +{ + if (customMem.customAlloc) { + /* calloc implemented as malloc+memset; + * not as efficient as calloc, but next best guess for custom mall= oc */ + void* const ptr =3D customMem.customAlloc(customMem.opaque, size); + ZSTD_memset(ptr, 0, size); + return ptr; + } + return ZSTD_calloc(1, size); +} + +MEM_STATIC void ZSTD_customFree(void* ptr, ZSTD_customMem customMem) +{ + if (ptr!=3DNULL) { + if (customMem.customFree) + customMem.customFree(customMem.opaque, ptr); + else + ZSTD_free(ptr); + } +} + +#endif /* ZSTD_ALLOCATIONS_H */ diff --git a/lib/zstd/common/bits.h b/lib/zstd/common/bits.h new file mode 100644 index 000000000000..c5faaa3d7b08 --- /dev/null +++ b/lib/zstd/common/bits.h @@ -0,0 +1,150 @@ +/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */ +/* + * Copyright (c) Meta Platforms, Inc. and affiliates. + * All rights reserved. + * + * This source code is licensed under both the BSD-style license (found in= the + * LICENSE file in the root directory of this source tree) and the GPLv2 (= found + * in the COPYING file in the root directory of this source tree). + * You may select, at your option, one of the above-listed licenses. + */ + +#ifndef ZSTD_BITS_H +#define ZSTD_BITS_H + +#include "mem.h" + +MEM_STATIC unsigned ZSTD_countTrailingZeros32_fallback(U32 val) +{ + assert(val !=3D 0); + { + static const U32 DeBruijnBytePos[32] =3D {0, 1, 28, 2, 29, 14, 24,= 3, + 30, 22, 20, 15, 25, 17, 4,= 8, + 31, 27, 13, 23, 21, 19, 16= , 7, + 26, 12, 18, 6, 11, 5, 10, = 9}; + return DeBruijnBytePos[((U32) ((val & -(S32) val) * 0x077CB531U)) = >> 27]; + } +} + +MEM_STATIC unsigned ZSTD_countTrailingZeros32(U32 val) +{ + assert(val !=3D 0); +#if (__GNUC__ >=3D 4) + return (unsigned)__builtin_ctz(val); +#else + return ZSTD_countTrailingZeros32_fallback(val); +#endif +} + +MEM_STATIC unsigned ZSTD_countLeadingZeros32_fallback(U32 val) +{ + assert(val !=3D 0); + { + static const U32 DeBruijnClz[32] =3D {0, 9, 1, 10, 13, 21, 2, 29, + 11, 14, 16, 18, 22, 25, 3, 30, + 8, 12, 20, 28, 15, 17, 24, 7, + 19, 27, 23, 6, 26, 5, 4, 31}; + val |=3D val >> 1; + val |=3D val >> 2; + val |=3D val >> 4; + val |=3D val >> 8; + val |=3D val >> 16; + return 31 - DeBruijnClz[(val * 0x07C4ACDDU) >> 27]; + } +} + +MEM_STATIC unsigned ZSTD_countLeadingZeros32(U32 val) +{ + assert(val !=3D 0); +#if (__GNUC__ >=3D 4) + return (unsigned)__builtin_clz(val); +#else + return ZSTD_countLeadingZeros32_fallback(val); +#endif +} + +MEM_STATIC unsigned ZSTD_countTrailingZeros64(U64 val) +{ + assert(val !=3D 0); +#if (__GNUC__ >=3D 4) && defined(__LP64__) + return (unsigned)__builtin_ctzll(val); +#else + { + U32 mostSignificantWord =3D (U32)(val >> 32); + U32 leastSignificantWord =3D (U32)val; + if (leastSignificantWord =3D=3D 0) { + return 32 + ZSTD_countTrailingZeros32(mostSignificantWord); + } else { + return ZSTD_countTrailingZeros32(leastSignificantWord); + } + } +#endif +} + +MEM_STATIC unsigned ZSTD_countLeadingZeros64(U64 val) +{ + assert(val !=3D 0); +#if (__GNUC__ >=3D 4) + return (unsigned)(__builtin_clzll(val)); +#else + { + U32 mostSignificantWord =3D (U32)(val >> 32); + U32 leastSignificantWord =3D (U32)val; + if (mostSignificantWord =3D=3D 0) { + return 32 + ZSTD_countLeadingZeros32(leastSignificantWord); + } else { + return ZSTD_countLeadingZeros32(mostSignificantWord); + } + } +#endif +} + +MEM_STATIC unsigned ZSTD_NbCommonBytes(size_t val) +{ + if (MEM_isLittleEndian()) { + if (MEM_64bits()) { + return ZSTD_countTrailingZeros64((U64)val) >> 3; + } else { + return ZSTD_countTrailingZeros32((U32)val) >> 3; + } + } else { /* Big Endian CPU */ + if (MEM_64bits()) { + return ZSTD_countLeadingZeros64((U64)val) >> 3; + } else { + return ZSTD_countLeadingZeros32((U32)val) >> 3; + } + } +} + +MEM_STATIC unsigned ZSTD_highbit32(U32 val) /* compress, dictBuilder, de= codeCorpus */ +{ + assert(val !=3D 0); + return 31 - ZSTD_countLeadingZeros32(val); +} + +/* ZSTD_rotateRight_*(): + * Rotates a bitfield to the right by "count" bits. + * https://en.wikipedia.org/w/index.php?title=3DCircular_shift&oldid=3D991= 635599#Implementing_circular_shifts + */ +MEM_STATIC +U64 ZSTD_rotateRight_U64(U64 const value, U32 count) { + assert(count < 64); + count &=3D 0x3F; /* for fickle pattern recognition */ + return (value >> count) | (U64)(value << ((0U - count) & 0x3F)); +} + +MEM_STATIC +U32 ZSTD_rotateRight_U32(U32 const value, U32 count) { + assert(count < 32); + count &=3D 0x1F; /* for fickle pattern recognition */ + return (value >> count) | (U32)(value << ((0U - count) & 0x1F)); +} + +MEM_STATIC +U16 ZSTD_rotateRight_U16(U16 const value, U32 count) { + assert(count < 16); + count &=3D 0x0F; /* for fickle pattern recognition */ + return (value >> count) | (U16)(value << ((0U - count) & 0x0F)); +} + +#endif /* ZSTD_BITS_H */ diff --git a/lib/zstd/common/bitstream.h b/lib/zstd/common/bitstream.h index feef3a1b1d60..86439da0eea7 100644 --- a/lib/zstd/common/bitstream.h +++ b/lib/zstd/common/bitstream.h @@ -1,7 +1,8 @@ +/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */ /* ****************************************************************** * bitstream * Part of FSE library - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * * You can contact the author at : * - Source repository : https://github.com/Cyan4973/FiniteStateEntropy @@ -27,7 +28,7 @@ #include "compiler.h" /* UNLIKELY() */ #include "debug.h" /* assert(), DEBUGLOG(), RAWLOG() */ #include "error_private.h" /* error codes and messages */ - +#include "bits.h" /* ZSTD_highbit32 */ =20 /*=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D * Target specific @@ -41,12 +42,13 @@ /*-****************************************** * bitStream encoding API (write forward) ********************************************/ +typedef size_t BitContainerType; /* bitStream can mix input from multiple sources. * A critical property of these streams is that they encode and decode in = **reverse** direction. * So the first bit sequence you add will be the last to be read, like a L= IFO stack. */ typedef struct { - size_t bitContainer; + BitContainerType bitContainer; unsigned bitPos; char* startPtr; char* ptr; @@ -54,7 +56,7 @@ typedef struct { } BIT_CStream_t; =20 MEM_STATIC size_t BIT_initCStream(BIT_CStream_t* bitC, void* dstBuffer, si= ze_t dstCapacity); -MEM_STATIC void BIT_addBits(BIT_CStream_t* bitC, size_t value, unsigned = nbBits); +MEM_STATIC void BIT_addBits(BIT_CStream_t* bitC, BitContainerType value,= unsigned nbBits); MEM_STATIC void BIT_flushBits(BIT_CStream_t* bitC); MEM_STATIC size_t BIT_closeCStream(BIT_CStream_t* bitC); =20 @@ -63,7 +65,7 @@ MEM_STATIC size_t BIT_closeCStream(BIT_CStream_t* bitC); * `dstCapacity` must be >=3D sizeof(bitD->bitContainer), otherwise @retur= n will be an error code. * * bits are first added to a local register. -* Local register is size_t, hence 64-bits on 64-bits systems, or 32-bits = on 32-bits systems. +* Local register is BitContainerType, 64-bits on 64-bits systems, or 32-b= its on 32-bits systems. * Writing data into memory is an explicit operation, performed by the flu= shBits function. * Hence keep track how many bits are potentially stored into local regist= er to avoid register overflow. * After a flushBits, a maximum of 7 bits might still be stored into local= register. @@ -80,28 +82,28 @@ MEM_STATIC size_t BIT_closeCStream(BIT_CStream_t* bitC); * bitStream decoding API (read backward) **********************************************/ typedef struct { - size_t bitContainer; + BitContainerType bitContainer; unsigned bitsConsumed; const char* ptr; const char* start; const char* limitPtr; } BIT_DStream_t; =20 -typedef enum { BIT_DStream_unfinished =3D 0, - BIT_DStream_endOfBuffer =3D 1, - BIT_DStream_completed =3D 2, - BIT_DStream_overflow =3D 3 } BIT_DStream_status; /* result= of BIT_reloadDStream() */ - /* 1,2,4,8 would be better for bitmap combinations, but slo= ws down performance a bit ... :( */ +typedef enum { BIT_DStream_unfinished =3D 0, /* fully refilled */ + BIT_DStream_endOfBuffer =3D 1, /* still some bits left in b= itstream */ + BIT_DStream_completed =3D 2, /* bitstream entirely consum= ed, bit-exact */ + BIT_DStream_overflow =3D 3 /* user requested more bits = than present in bitstream */ + } BIT_DStream_status; /* result of BIT_reloadDStream() */ =20 MEM_STATIC size_t BIT_initDStream(BIT_DStream_t* bitD, const void* srcBu= ffer, size_t srcSize); -MEM_STATIC size_t BIT_readBits(BIT_DStream_t* bitD, unsigned nbBits); +MEM_STATIC BitContainerType BIT_readBits(BIT_DStream_t* bitD, unsigned nbB= its); MEM_STATIC BIT_DStream_status BIT_reloadDStream(BIT_DStream_t* bitD); MEM_STATIC unsigned BIT_endOfDStream(const BIT_DStream_t* bitD); =20 =20 /* Start by invoking BIT_initDStream(). * A chunk of the bitStream is then stored into a local register. -* Local register size is 64-bits on 64-bits systems, 32-bits on 32-bits s= ystems (size_t). +* Local register size is 64-bits on 64-bits systems, 32-bits on 32-bits s= ystems (BitContainerType). * You can then retrieve bitFields stored into the local register, **in re= verse order**. * Local register is explicitly reloaded from memory by the BIT_reloadDStr= eam() method. * A reload guarantee a minimum of ((8*sizeof(bitD->bitContainer))-7) bits= when its result is BIT_DStream_unfinished. @@ -113,7 +115,7 @@ MEM_STATIC unsigned BIT_endOfDStream(const BIT_DStream_= t* bitD); /*-**************************************** * unsafe API ******************************************/ -MEM_STATIC void BIT_addBitsFast(BIT_CStream_t* bitC, size_t value, unsigne= d nbBits); +MEM_STATIC void BIT_addBitsFast(BIT_CStream_t* bitC, BitContainerType valu= e, unsigned nbBits); /* faster, but works only if value is "clean", meaning all high bits above= nbBits are 0 */ =20 MEM_STATIC void BIT_flushBitsFast(BIT_CStream_t* bitC); @@ -122,33 +124,6 @@ MEM_STATIC void BIT_flushBitsFast(BIT_CStream_t* bitC); MEM_STATIC size_t BIT_readBitsFast(BIT_DStream_t* bitD, unsigned nbBits); /* faster, but works only if nbBits >=3D 1 */ =20 - - -/*-************************************************************** -* Internal functions -****************************************************************/ -MEM_STATIC unsigned BIT_highbit32 (U32 val) -{ - assert(val !=3D 0); - { -# if (__GNUC__ >=3D 3) /* Use GCC Intrinsic */ - return __builtin_clz (val) ^ 31; -# else /* Software version */ - static const unsigned DeBruijnClz[32] =3D { 0, 9, 1, 10, 13, 21,= 2, 29, - 11, 14, 16, 18, 22, 25, = 3, 30, - 8, 12, 20, 28, 15, 17, 2= 4, 7, - 19, 27, 23, 6, 26, 5, = 4, 31 }; - U32 v =3D val; - v |=3D v >> 1; - v |=3D v >> 2; - v |=3D v >> 4; - v |=3D v >> 8; - v |=3D v >> 16; - return DeBruijnClz[ (U32) (v * 0x07C4ACDDU) >> 27]; -# endif - } -} - /*=3D=3D=3D=3D=3D Local Constants =3D=3D=3D=3D=3D*/ static const unsigned BIT_mask[] =3D { 0, 1, 3, 7, 0xF, 0x1F, @@ -178,16 +153,22 @@ MEM_STATIC size_t BIT_initCStream(BIT_CStream_t* bitC, return 0; } =20 +FORCE_INLINE_TEMPLATE BitContainerType BIT_getLowerBits(BitContainerType b= itContainer, U32 const nbBits) +{ + assert(nbBits < BIT_MASK_SIZE); + return bitContainer & BIT_mask[nbBits]; +} + /*! BIT_addBits() : * can add up to 31 bits into `bitC`. * Note : does not check for register overflow ! */ MEM_STATIC void BIT_addBits(BIT_CStream_t* bitC, - size_t value, unsigned nbBits) + BitContainerType value, unsigned nbBits) { DEBUG_STATIC_ASSERT(BIT_MASK_SIZE =3D=3D 32); assert(nbBits < BIT_MASK_SIZE); assert(nbBits + bitC->bitPos < sizeof(bitC->bitContainer) * 8); - bitC->bitContainer |=3D (value & BIT_mask[nbBits]) << bitC->bitPos; + bitC->bitContainer |=3D BIT_getLowerBits(value, nbBits) << bitC->bitPo= s; bitC->bitPos +=3D nbBits; } =20 @@ -195,7 +176,7 @@ MEM_STATIC void BIT_addBits(BIT_CStream_t* bitC, * works only if `value` is _clean_, * meaning all high bits above nbBits are 0 */ MEM_STATIC void BIT_addBitsFast(BIT_CStream_t* bitC, - size_t value, unsigned nbBits) + BitContainerType value, unsigned nbBits) { assert((value>>nbBits) =3D=3D 0); assert(nbBits + bitC->bitPos < sizeof(bitC->bitContainer) * 8); @@ -242,7 +223,7 @@ MEM_STATIC size_t BIT_closeCStream(BIT_CStream_t* bitC) BIT_addBitsFast(bitC, 1, 1); /* endMark */ BIT_flushBits(bitC); if (bitC->ptr >=3D bitC->endPtr) return 0; /* overflow detected */ - return (bitC->ptr - bitC->startPtr) + (bitC->bitPos > 0); + return (size_t)(bitC->ptr - bitC->startPtr) + (bitC->bitPos > 0); } =20 =20 @@ -266,35 +247,35 @@ MEM_STATIC size_t BIT_initDStream(BIT_DStream_t* bitD= , const void* srcBuffer, si bitD->ptr =3D (const char*)srcBuffer + srcSize - sizeof(bitD->bi= tContainer); bitD->bitContainer =3D MEM_readLEST(bitD->ptr); { BYTE const lastByte =3D ((const BYTE*)srcBuffer)[srcSize-1]; - bitD->bitsConsumed =3D lastByte ? 8 - BIT_highbit32(lastByte) : = 0; /* ensures bitsConsumed is always set */ + bitD->bitsConsumed =3D lastByte ? 8 - ZSTD_highbit32(lastByte) := 0; /* ensures bitsConsumed is always set */ if (lastByte =3D=3D 0) return ERROR(GENERIC); /* endMark not pre= sent */ } } else { bitD->ptr =3D bitD->start; bitD->bitContainer =3D *(const BYTE*)(bitD->start); switch(srcSize) { - case 7: bitD->bitContainer +=3D (size_t)(((const BYTE*)(srcBuffer)= )[6]) << (sizeof(bitD->bitContainer)*8 - 16); + case 7: bitD->bitContainer +=3D (BitContainerType)(((const BYTE*)(= srcBuffer))[6]) << (sizeof(bitD->bitContainer)*8 - 16); ZSTD_FALLTHROUGH; =20 - case 6: bitD->bitContainer +=3D (size_t)(((const BYTE*)(srcBuffer)= )[5]) << (sizeof(bitD->bitContainer)*8 - 24); + case 6: bitD->bitContainer +=3D (BitContainerType)(((const BYTE*)(= srcBuffer))[5]) << (sizeof(bitD->bitContainer)*8 - 24); ZSTD_FALLTHROUGH; =20 - case 5: bitD->bitContainer +=3D (size_t)(((const BYTE*)(srcBuffer)= )[4]) << (sizeof(bitD->bitContainer)*8 - 32); + case 5: bitD->bitContainer +=3D (BitContainerType)(((const BYTE*)(= srcBuffer))[4]) << (sizeof(bitD->bitContainer)*8 - 32); ZSTD_FALLTHROUGH; =20 - case 4: bitD->bitContainer +=3D (size_t)(((const BYTE*)(srcBuffer)= )[3]) << 24; + case 4: bitD->bitContainer +=3D (BitContainerType)(((const BYTE*)(= srcBuffer))[3]) << 24; ZSTD_FALLTHROUGH; =20 - case 3: bitD->bitContainer +=3D (size_t)(((const BYTE*)(srcBuffer)= )[2]) << 16; + case 3: bitD->bitContainer +=3D (BitContainerType)(((const BYTE*)(= srcBuffer))[2]) << 16; ZSTD_FALLTHROUGH; =20 - case 2: bitD->bitContainer +=3D (size_t)(((const BYTE*)(srcBuffer)= )[1]) << 8; + case 2: bitD->bitContainer +=3D (BitContainerType)(((const BYTE*)(= srcBuffer))[1]) << 8; ZSTD_FALLTHROUGH; =20 default: break; } { BYTE const lastByte =3D ((const BYTE*)srcBuffer)[srcSize-1]; - bitD->bitsConsumed =3D lastByte ? 8 - BIT_highbit32(lastByte) = : 0; + bitD->bitsConsumed =3D lastByte ? 8 - ZSTD_highbit32(lastByte)= : 0; if (lastByte =3D=3D 0) return ERROR(corruption_detected); /* = endMark not present */ } bitD->bitsConsumed +=3D (U32)(sizeof(bitD->bitContainer) - srcSize= )*8; @@ -303,12 +284,12 @@ MEM_STATIC size_t BIT_initDStream(BIT_DStream_t* bitD= , const void* srcBuffer, si return srcSize; } =20 -MEM_STATIC FORCE_INLINE_ATTR size_t BIT_getUpperBits(size_t bitContainer, = U32 const start) +FORCE_INLINE_TEMPLATE BitContainerType BIT_getUpperBits(BitContainerType b= itContainer, U32 const start) { return bitContainer >> start; } =20 -MEM_STATIC FORCE_INLINE_ATTR size_t BIT_getMiddleBits(size_t bitContainer,= U32 const start, U32 const nbBits) +FORCE_INLINE_TEMPLATE BitContainerType BIT_getMiddleBits(BitContainerType = bitContainer, U32 const start, U32 const nbBits) { U32 const regMask =3D sizeof(bitContainer)*8 - 1; /* if start > regMask, bitstream is corrupted, and result is undefined= */ @@ -318,26 +299,20 @@ MEM_STATIC FORCE_INLINE_ATTR size_t BIT_getMiddleBits= (size_t bitContainer, U32 c * such cpus old (pre-Haswell, 2013) and their performance is not of t= hat * importance. */ -#if defined(__x86_64__) || defined(_M_X86) +#if defined(__x86_64__) || defined(_M_X64) return (bitContainer >> (start & regMask)) & ((((U64)1) << nbBits) - 1= ); #else return (bitContainer >> (start & regMask)) & BIT_mask[nbBits]; #endif } =20 -MEM_STATIC FORCE_INLINE_ATTR size_t BIT_getLowerBits(size_t bitContainer, = U32 const nbBits) -{ - assert(nbBits < BIT_MASK_SIZE); - return bitContainer & BIT_mask[nbBits]; -} - /*! BIT_lookBits() : * Provides next n bits from local register. * local register is not modified. * On 32-bits, maxNbBits=3D=3D24. * On 64-bits, maxNbBits=3D=3D56. * @return : value extracted */ -MEM_STATIC FORCE_INLINE_ATTR size_t BIT_lookBits(const BIT_DStream_t* bi= tD, U32 nbBits) +FORCE_INLINE_TEMPLATE BitContainerType BIT_lookBits(const BIT_DStream_t* = bitD, U32 nbBits) { /* arbitrate between double-shift and shift+mask */ #if 1 @@ -353,14 +328,14 @@ MEM_STATIC FORCE_INLINE_ATTR size_t BIT_lookBits(con= st BIT_DStream_t* bitD, U3 =20 /*! BIT_lookBitsFast() : * unsafe version; only works if nbBits >=3D 1 */ -MEM_STATIC size_t BIT_lookBitsFast(const BIT_DStream_t* bitD, U32 nbBits) +MEM_STATIC BitContainerType BIT_lookBitsFast(const BIT_DStream_t* bitD, U3= 2 nbBits) { U32 const regMask =3D sizeof(bitD->bitContainer)*8 - 1; assert(nbBits >=3D 1); return (bitD->bitContainer << (bitD->bitsConsumed & regMask)) >> (((re= gMask+1)-nbBits) & regMask); } =20 -MEM_STATIC FORCE_INLINE_ATTR void BIT_skipBits(BIT_DStream_t* bitD, U32 nb= Bits) +FORCE_INLINE_TEMPLATE void BIT_skipBits(BIT_DStream_t* bitD, U32 nbBits) { bitD->bitsConsumed +=3D nbBits; } @@ -369,23 +344,38 @@ MEM_STATIC FORCE_INLINE_ATTR void BIT_skipBits(BIT_DS= tream_t* bitD, U32 nbBits) * Read (consume) next n bits from local register and update. * Pay attention to not read more than nbBits contained into local regist= er. * @return : extracted value. */ -MEM_STATIC FORCE_INLINE_ATTR size_t BIT_readBits(BIT_DStream_t* bitD, unsi= gned nbBits) +FORCE_INLINE_TEMPLATE BitContainerType BIT_readBits(BIT_DStream_t* bitD, u= nsigned nbBits) { - size_t const value =3D BIT_lookBits(bitD, nbBits); + BitContainerType const value =3D BIT_lookBits(bitD, nbBits); BIT_skipBits(bitD, nbBits); return value; } =20 /*! BIT_readBitsFast() : - * unsafe version; only works only if nbBits >=3D 1 */ -MEM_STATIC size_t BIT_readBitsFast(BIT_DStream_t* bitD, unsigned nbBits) + * unsafe version; only works if nbBits >=3D 1 */ +MEM_STATIC BitContainerType BIT_readBitsFast(BIT_DStream_t* bitD, unsigned= nbBits) { - size_t const value =3D BIT_lookBitsFast(bitD, nbBits); + BitContainerType const value =3D BIT_lookBitsFast(bitD, nbBits); assert(nbBits >=3D 1); BIT_skipBits(bitD, nbBits); return value; } =20 +/*! BIT_reloadDStream_internal() : + * Simple variant of BIT_reloadDStream(), with two conditions: + * 1. bitstream is valid : bitsConsumed <=3D sizeof(bitD->bitContainer)*8 + * 2. look window is valid after shifted down : bitD->ptr >=3D bitD->start + */ +MEM_STATIC BIT_DStream_status BIT_reloadDStream_internal(BIT_DStream_t* bi= tD) +{ + assert(bitD->bitsConsumed <=3D sizeof(bitD->bitContainer)*8); + bitD->ptr -=3D bitD->bitsConsumed >> 3; + assert(bitD->ptr >=3D bitD->start); + bitD->bitsConsumed &=3D 7; + bitD->bitContainer =3D MEM_readLEST(bitD->ptr); + return BIT_DStream_unfinished; +} + /*! BIT_reloadDStreamFast() : * Similar to BIT_reloadDStream(), but with two differences: * 1. bitsConsumed <=3D sizeof(bitD->bitContainer)*8 must hold! @@ -396,31 +386,35 @@ MEM_STATIC BIT_DStream_status BIT_reloadDStreamFast(B= IT_DStream_t* bitD) { if (UNLIKELY(bitD->ptr < bitD->limitPtr)) return BIT_DStream_overflow; - assert(bitD->bitsConsumed <=3D sizeof(bitD->bitContainer)*8); - bitD->ptr -=3D bitD->bitsConsumed >> 3; - bitD->bitsConsumed &=3D 7; - bitD->bitContainer =3D MEM_readLEST(bitD->ptr); - return BIT_DStream_unfinished; + return BIT_reloadDStream_internal(bitD); } =20 /*! BIT_reloadDStream() : * Refill `bitD` from buffer previously set in BIT_initDStream() . - * This function is safe, it guarantees it will not read beyond src buffe= r. + * This function is safe, it guarantees it will not never beyond src buff= er. * @return : status of `BIT_DStream_t` internal register. * when status =3D=3D BIT_DStream_unfinished, internal register = is filled with at least 25 or 57 bits */ -MEM_STATIC BIT_DStream_status BIT_reloadDStream(BIT_DStream_t* bitD) +FORCE_INLINE_TEMPLATE BIT_DStream_status BIT_reloadDStream(BIT_DStream_t* = bitD) { - if (bitD->bitsConsumed > (sizeof(bitD->bitContainer)*8)) /* overflow = detected, like end of stream */ + /* note : once in overflow mode, a bitstream remains in this mode unti= l it's reset */ + if (UNLIKELY(bitD->bitsConsumed > (sizeof(bitD->bitContainer)*8))) { + static const BitContainerType zeroFilled =3D 0; + bitD->ptr =3D (const char*)&zeroFilled; /* aliasing is allowed for= char */ + /* overflow detected, erroneous scenario or end of stream: no upda= te */ return BIT_DStream_overflow; + } + + assert(bitD->ptr >=3D bitD->start); =20 if (bitD->ptr >=3D bitD->limitPtr) { - return BIT_reloadDStreamFast(bitD); + return BIT_reloadDStream_internal(bitD); } if (bitD->ptr =3D=3D bitD->start) { + /* reached end of bitStream =3D> no update */ if (bitD->bitsConsumed < sizeof(bitD->bitContainer)*8) return BIT_= DStream_endOfBuffer; return BIT_DStream_completed; } - /* start < ptr < limitPtr */ + /* start < ptr < limitPtr =3D> cautious update */ { U32 nbBytes =3D bitD->bitsConsumed >> 3; BIT_DStream_status result =3D BIT_DStream_unfinished; if (bitD->ptr - nbBytes < bitD->start) { @@ -442,5 +436,4 @@ MEM_STATIC unsigned BIT_endOfDStream(const BIT_DStream_= t* DStream) return ((DStream->ptr =3D=3D DStream->start) && (DStream->bitsConsumed= =3D=3D sizeof(DStream->bitContainer)*8)); } =20 - #endif /* BITSTREAM_H_MODULE */ diff --git a/lib/zstd/common/compiler.h b/lib/zstd/common/compiler.h index c42d39faf9bd..dc9bd15e174e 100644 --- a/lib/zstd/common/compiler.h +++ b/lib/zstd/common/compiler.h @@ -1,5 +1,6 @@ +/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */ /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -11,6 +12,8 @@ #ifndef ZSTD_COMPILER_H #define ZSTD_COMPILER_H =20 +#include + #include "portability_macros.h" =20 /*-******************************************************* @@ -41,12 +44,15 @@ */ #define WIN_CDECL =20 +/* UNUSED_ATTR tells the compiler it is okay if the function is unused. */ +#define UNUSED_ATTR __attribute__((unused)) + /* * FORCE_INLINE_TEMPLATE is used to define C "templates", which take const= ant * parameters. They must be inlined for the compiler to eliminate the cons= tant * branches. */ -#define FORCE_INLINE_TEMPLATE static INLINE_KEYWORD FORCE_INLINE_ATTR +#define FORCE_INLINE_TEMPLATE static INLINE_KEYWORD FORCE_INLINE_ATTR UNUS= ED_ATTR /* * HINT_INLINE is used to help the compiler generate better code. It is *n= ot* * used for "templates", so it can be tweaked based on the compilers @@ -61,11 +67,21 @@ #if !defined(__clang__) && defined(__GNUC__) && __GNUC__ >=3D 4 && __GNUC_= MINOR__ >=3D 8 && __GNUC__ < 5 # define HINT_INLINE static INLINE_KEYWORD #else -# define HINT_INLINE static INLINE_KEYWORD FORCE_INLINE_ATTR +# define HINT_INLINE FORCE_INLINE_TEMPLATE #endif =20 -/* UNUSED_ATTR tells the compiler it is okay if the function is unused. */ -#define UNUSED_ATTR __attribute__((unused)) +/* "soft" inline : + * The compiler is free to select if it's a good idea to inline or not. + * The main objective is to silence compiler warnings + * when a defined function in included but not used. + * + * Note : this macro is prefixed `MEM_` because it used to be provided by = `mem.h` unit. + * Updating the prefix is probably preferable, but requires a fairly large= codemod, + * since this name is used everywhere. + */ +#ifndef MEM_STATIC /* already defined in Linux Kernel mem.h */ +#define MEM_STATIC static __inline UNUSED_ATTR +#endif =20 /* force no inlining */ #define FORCE_NOINLINE static __attribute__((__noinline__)) @@ -86,23 +102,24 @@ # define PREFETCH_L1(ptr) __builtin_prefetch((ptr), 0 /* rw=3D=3Dread */= , 3 /* locality */) # define PREFETCH_L2(ptr) __builtin_prefetch((ptr), 0 /* rw=3D=3Dread */= , 2 /* locality */) #elif defined(__aarch64__) -# define PREFETCH_L1(ptr) __asm__ __volatile__("prfm pldl1keep, %0" ::"Q= "(*(ptr))) -# define PREFETCH_L2(ptr) __asm__ __volatile__("prfm pldl2keep, %0" ::"Q= "(*(ptr))) +# define PREFETCH_L1(ptr) do { __asm__ __volatile__("prfm pldl1keep, %0"= ::"Q"(*(ptr))); } while (0) +# define PREFETCH_L2(ptr) do { __asm__ __volatile__("prfm pldl2keep, %0"= ::"Q"(*(ptr))); } while (0) #else -# define PREFETCH_L1(ptr) (void)(ptr) /* disabled */ -# define PREFETCH_L2(ptr) (void)(ptr) /* disabled */ +# define PREFETCH_L1(ptr) do { (void)(ptr); } while (0) /* disabled */ +# define PREFETCH_L2(ptr) do { (void)(ptr); } while (0) /* disabled */ #endif /* NO_PREFETCH */ =20 #define CACHELINE_SIZE 64 =20 -#define PREFETCH_AREA(p, s) { \ - const char* const _ptr =3D (const char*)(p); \ - size_t const _size =3D (size_t)(s); \ - size_t _pos; \ - for (_pos=3D0; _pos<_size; _pos+=3DCACHELINE_SIZE) { \ - PREFETCH_L2(_ptr + _pos); \ - } \ -} +#define PREFETCH_AREA(p, s) \ + do { \ + const char* const _ptr =3D (const char*)(p); \ + size_t const _size =3D (size_t)(s); \ + size_t _pos; \ + for (_pos=3D0; _pos<_size; _pos+=3DCACHELINE_SIZE) { \ + PREFETCH_L2(_ptr + _pos); \ + } \ + } while (0) =20 /* vectorization * older GCC (pre gcc-4.3 picked as the cutoff) uses a different syntax, @@ -126,16 +143,13 @@ #define UNLIKELY(x) (__builtin_expect((x), 0)) =20 #if __has_builtin(__builtin_unreachable) || (defined(__GNUC__) && (__GNUC_= _ > 4 || (__GNUC__ =3D=3D 4 && __GNUC_MINOR__ >=3D 5))) -# define ZSTD_UNREACHABLE { assert(0), __builtin_unreachable(); } +# define ZSTD_UNREACHABLE do { assert(0), __builtin_unreachable(); } whil= e (0) #else -# define ZSTD_UNREACHABLE { assert(0); } +# define ZSTD_UNREACHABLE do { assert(0); } while (0) #endif =20 /* disable warnings */ =20 -/*Like DYNAMIC_BMI2 but for compile time determination of BMI2 support*/ - - /* compile time determination of SIMD support */ =20 /* C-language Attributes are added in C23. */ @@ -158,9 +172,15 @@ #define ZSTD_FALLTHROUGH fallthrough =20 /*-************************************************************** -* Alignment check +* Alignment *****************************************************************/ =20 +/* @return 1 if @u is a 2^n value, 0 otherwise + * useful to check a value is valid for alignment restrictions */ +MEM_STATIC int ZSTD_isPower2(size_t u) { + return (u & (u-1)) =3D=3D 0; +} + /* this test was initially positioned in mem.h, * but this file is removed (or replaced) for linux kernel * so it's now hosted in compiler.h, @@ -175,10 +195,95 @@ =20 #endif /* ZSTD_ALIGNOF */ =20 +#ifndef ZSTD_ALIGNED +/* C90-compatible alignment macro (GCC/Clang). Adjust for other compilers = if needed. */ +#define ZSTD_ALIGNED(a) __attribute__((aligned(a))) +#endif /* ZSTD_ALIGNED */ + + /*-************************************************************** * Sanitizer *****************************************************************/ =20 +/* + * Zstd relies on pointer overflow in its decompressor. + * We add this attribute to functions that rely on pointer overflow. + */ +#ifndef ZSTD_ALLOW_POINTER_OVERFLOW_ATTR +# if __has_attribute(no_sanitize) +# if !defined(__clang__) && defined(__GNUC__) && __GNUC__ < 8 + /* gcc < 8 only has signed-integer-overlow which triggers on pointe= r overflow */ +# define ZSTD_ALLOW_POINTER_OVERFLOW_ATTR __attribute__((no_sanitize(= "signed-integer-overflow"))) +# else + /* older versions of clang [3.7, 5.0) will warn that pointer-overfl= ow is ignored. */ +# define ZSTD_ALLOW_POINTER_OVERFLOW_ATTR __attribute__((no_sanitize(= "pointer-overflow"))) +# endif +# else +# define ZSTD_ALLOW_POINTER_OVERFLOW_ATTR +# endif +#endif + +/* + * Helper function to perform a wrapped pointer difference without trigger= ing + * UBSAN. + * + * @returns lhs - rhs with wrapping + */ +MEM_STATIC +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR +ptrdiff_t ZSTD_wrappedPtrDiff(unsigned char const* lhs, unsigned char cons= t* rhs) +{ + return lhs - rhs; +} + +/* + * Helper function to perform a wrapped pointer add without triggering UBS= AN. + * + * @return ptr + add with wrapping + */ +MEM_STATIC +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR +unsigned char const* ZSTD_wrappedPtrAdd(unsigned char const* ptr, ptrdiff_= t add) +{ + return ptr + add; +} + +/* + * Helper function to perform a wrapped pointer subtraction without trigge= ring + * UBSAN. + * + * @return ptr - sub with wrapping + */ +MEM_STATIC +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR +unsigned char const* ZSTD_wrappedPtrSub(unsigned char const* ptr, ptrdiff_= t sub) +{ + return ptr - sub; +} + +/* + * Helper function to add to a pointer that works around C's undefined beh= avior + * of adding 0 to NULL. + * + * @returns `ptr + add` except it defines `NULL + 0 =3D=3D NULL`. + */ +MEM_STATIC +unsigned char* ZSTD_maybeNullPtrAdd(unsigned char* ptr, ptrdiff_t add) +{ + return add > 0 ? ptr + add : ptr; +} + +/* Issue #3240 reports an ASAN failure on an llvm-mingw build. Out of an + * abundance of caution, disable our custom poisoning on mingw. */ +#ifdef __MINGW32__ +#ifndef ZSTD_ASAN_DONT_POISON_WORKSPACE +#define ZSTD_ASAN_DONT_POISON_WORKSPACE 1 +#endif +#ifndef ZSTD_MSAN_DONT_POISON_WORKSPACE +#define ZSTD_MSAN_DONT_POISON_WORKSPACE 1 +#endif +#endif + =20 =20 #endif /* ZSTD_COMPILER_H */ diff --git a/lib/zstd/common/cpu.h b/lib/zstd/common/cpu.h index 0db7b42407ee..d8319a2bef4c 100644 --- a/lib/zstd/common/cpu.h +++ b/lib/zstd/common/cpu.h @@ -1,5 +1,6 @@ +/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */ /* - * Copyright (c) Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the diff --git a/lib/zstd/common/debug.c b/lib/zstd/common/debug.c index bb863c9ea616..8eb6aa9a3b20 100644 --- a/lib/zstd/common/debug.c +++ b/lib/zstd/common/debug.c @@ -1,7 +1,8 @@ +// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause /* ****************************************************************** * debug * Part of FSE library - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * * You can contact the author at : * - Source repository : https://github.com/Cyan4973/FiniteStateEntropy @@ -21,4 +22,10 @@ =20 #include "debug.h" =20 +#if (DEBUGLEVEL>=3D2) +/* We only use this when DEBUGLEVEL>=3D2, but we get -Werror=3Dpedantic er= rors if a + * translation unit is empty. So remove this from Linux kernel builds, but + * otherwise just leave it in. + */ int g_debuglevel =3D DEBUGLEVEL; +#endif diff --git a/lib/zstd/common/debug.h b/lib/zstd/common/debug.h index 6dd88d1fbd02..c8a10281f112 100644 --- a/lib/zstd/common/debug.h +++ b/lib/zstd/common/debug.h @@ -1,7 +1,8 @@ +/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */ /* ****************************************************************** * debug * Part of FSE library - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * * You can contact the author at : * - Source repository : https://github.com/Cyan4973/FiniteStateEntropy @@ -33,7 +34,6 @@ #define DEBUG_H_12987983217 =20 =20 - /* static assert is triggered at compile time, leaving no runtime artefact. * static assert only works with compile-time constants. * Also, this variant can only be used inside a function. */ @@ -82,20 +82,27 @@ extern int g_debuglevel; /* the variable is only declar= ed, It's useful when enabling very verbose levels on selective conditions (such as position in s= rc) */ =20 -# define RAWLOG(l, ...) { \ - if (l<=3Dg_debuglevel) { \ - ZSTD_DEBUG_PRINT(__VA_ARGS__); \ - } } -# define DEBUGLOG(l, ...) { \ - if (l<=3Dg_debuglevel) { \ - ZSTD_DEBUG_PRINT(__FILE__ ": " __VA_ARGS__); \ - ZSTD_DEBUG_PRINT(" \n"); \ - } } +# define RAWLOG(l, ...) \ + do { \ + if (l<=3Dg_debuglevel) { \ + ZSTD_DEBUG_PRINT(__VA_ARGS__); \ + } \ + } while (0) + +#define STRINGIFY(x) #x +#define TOSTRING(x) STRINGIFY(x) +#define LINE_AS_STRING TOSTRING(__LINE__) + +# define DEBUGLOG(l, ...) \ + do { \ + if (l<=3Dg_debuglevel) { \ + ZSTD_DEBUG_PRINT(__FILE__ ":" LINE_AS_STRING ": " __VA_ARGS__)= ; \ + ZSTD_DEBUG_PRINT(" \n"); \ + } \ + } while (0) #else -# define RAWLOG(l, ...) {} /* disabled */ -# define DEBUGLOG(l, ...) {} /* disabled */ +# define RAWLOG(l, ...) do { } while (0) /* disabled */ +# define DEBUGLOG(l, ...) do { } while (0) /* disabled */ #endif =20 - - #endif /* DEBUG_H_12987983217 */ diff --git a/lib/zstd/common/entropy_common.c b/lib/zstd/common/entropy_com= mon.c index fef67056f052..6cdd82233fb5 100644 --- a/lib/zstd/common/entropy_common.c +++ b/lib/zstd/common/entropy_common.c @@ -1,6 +1,7 @@ +// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause /* ****************************************************************** * Common functions of New Generation Entropy library - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * * You can contact the author at : * - FSE+HUF source repository : https://github.com/Cyan4973/FiniteStateE= ntropy @@ -19,8 +20,8 @@ #include "error_private.h" /* ERR_*, ERROR */ #define FSE_STATIC_LINKING_ONLY /* FSE_MIN_TABLELOG */ #include "fse.h" -#define HUF_STATIC_LINKING_ONLY /* HUF_TABLELOG_ABSOLUTEMAX */ #include "huf.h" +#include "bits.h" /* ZSDT_highbit32, ZSTD_countTrailingZero= s32 */ =20 =20 /*=3D=3D=3D Version =3D=3D=3D*/ @@ -38,23 +39,6 @@ const char* HUF_getErrorName(size_t code) { return ERR_g= etErrorName(code); } /*-************************************************************** * FSE NCount encoding-decoding ****************************************************************/ -static U32 FSE_ctz(U32 val) -{ - assert(val !=3D 0); - { -# if (__GNUC__ >=3D 3) /* GCC Intrinsic */ - return __builtin_ctz(val); -# else /* Software version */ - U32 count =3D 0; - while ((val & 1) =3D=3D 0) { - val >>=3D 1; - ++count; - } - return count; -# endif - } -} - FORCE_INLINE_TEMPLATE size_t FSE_readNCount_body(short* normalizedCounter, unsigned* maxSVPtr, u= nsigned* tableLogPtr, const void* headerBuffer, size_t hbSize) @@ -102,7 +86,7 @@ size_t FSE_readNCount_body(short* normalizedCounter, uns= igned* maxSVPtr, unsigne * repeat. * Avoid UB by setting the high bit to 1. */ - int repeats =3D FSE_ctz(~bitStream | 0x80000000) >> 1; + int repeats =3D ZSTD_countTrailingZeros32(~bitStream | 0x80000= 000) >> 1; while (repeats >=3D 12) { charnum +=3D 3 * 12; if (LIKELY(ip <=3D iend-7)) { @@ -113,7 +97,7 @@ size_t FSE_readNCount_body(short* normalizedCounter, uns= igned* maxSVPtr, unsigne ip =3D iend - 4; } bitStream =3D MEM_readLE32(ip) >> bitCount; - repeats =3D FSE_ctz(~bitStream | 0x80000000) >> 1; + repeats =3D ZSTD_countTrailingZeros32(~bitStream | 0x80000= 000) >> 1; } charnum +=3D 3 * repeats; bitStream >>=3D 2 * repeats; @@ -178,7 +162,7 @@ size_t FSE_readNCount_body(short* normalizedCounter, un= signed* maxSVPtr, unsigne * know that threshold > 1. */ if (remaining <=3D 1) break; - nbBits =3D BIT_highbit32(remaining) + 1; + nbBits =3D ZSTD_highbit32(remaining) + 1; threshold =3D 1 << (nbBits - 1); } if (charnum >=3D maxSV1) break; @@ -253,7 +237,7 @@ size_t HUF_readStats(BYTE* huffWeight, size_t hwSize, U= 32* rankStats, const void* src, size_t srcSize) { U32 wksp[HUF_READ_STATS_WORKSPACE_SIZE_U32]; - return HUF_readStats_wksp(huffWeight, hwSize, rankStats, nbSymbolsPtr,= tableLogPtr, src, srcSize, wksp, sizeof(wksp), /* bmi2 */ 0); + return HUF_readStats_wksp(huffWeight, hwSize, rankStats, nbSymbolsPtr,= tableLogPtr, src, srcSize, wksp, sizeof(wksp), /* flags */ 0); } =20 FORCE_INLINE_TEMPLATE size_t @@ -301,14 +285,14 @@ HUF_readStats_body(BYTE* huffWeight, size_t hwSize, U= 32* rankStats, if (weightTotal =3D=3D 0) return ERROR(corruption_detected); =20 /* get last non-null symbol weight (implied, total must be 2^n) */ - { U32 const tableLog =3D BIT_highbit32(weightTotal) + 1; + { U32 const tableLog =3D ZSTD_highbit32(weightTotal) + 1; if (tableLog > HUF_TABLELOG_MAX) return ERROR(corruption_detected); *tableLogPtr =3D tableLog; /* determine last weight */ { U32 const total =3D 1 << tableLog; U32 const rest =3D total - weightTotal; - U32 const verif =3D 1 << BIT_highbit32(rest); - U32 const lastWeight =3D BIT_highbit32(rest) + 1; + U32 const verif =3D 1 << ZSTD_highbit32(rest); + U32 const lastWeight =3D ZSTD_highbit32(rest) + 1; if (verif !=3D rest) return ERROR(corruption_detected); /* = last value must be a clean power of 2 */ huffWeight[oSize] =3D (BYTE)lastWeight; rankStats[lastWeight]++; @@ -345,13 +329,13 @@ size_t HUF_readStats_wksp(BYTE* huffWeight, size_t hw= Size, U32* rankStats, U32* nbSymbolsPtr, U32* tableLogPtr, const void* src, size_t srcSize, void* workSpace, size_t wkspSize, - int bmi2) + int flags) { #if DYNAMIC_BMI2 - if (bmi2) { + if (flags & HUF_flags_bmi2) { return HUF_readStats_body_bmi2(huffWeight, hwSize, rankStats, nbSy= mbolsPtr, tableLogPtr, src, srcSize, workSpace, wkspSize); } #endif - (void)bmi2; + (void)flags; return HUF_readStats_body_default(huffWeight, hwSize, rankStats, nbSym= bolsPtr, tableLogPtr, src, srcSize, workSpace, wkspSize); } diff --git a/lib/zstd/common/error_private.c b/lib/zstd/common/error_privat= e.c index 6d1135f8c373..6c3dbad838b6 100644 --- a/lib/zstd/common/error_private.c +++ b/lib/zstd/common/error_private.c @@ -1,5 +1,6 @@ +// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -27,9 +28,11 @@ const char* ERR_getErrorString(ERR_enum code) case PREFIX(version_unsupported): return "Version not supported"; case PREFIX(frameParameter_unsupported): return "Unsupported frame par= ameter"; case PREFIX(frameParameter_windowTooLarge): return "Frame requires too= much memory for decoding"; - case PREFIX(corruption_detected): return "Corrupted block detected"; + case PREFIX(corruption_detected): return "Data corruption detected"; case PREFIX(checksum_wrong): return "Restored data doesn't match check= sum"; + case PREFIX(literals_headerWrong): return "Header of Literals' block d= oesn't respect format specification"; case PREFIX(parameter_unsupported): return "Unsupported parameter"; + case PREFIX(parameter_combination_unsupported): return "Unsupported co= mbination of parameters"; case PREFIX(parameter_outOfBound): return "Parameter is out of bound"; case PREFIX(init_missing): return "Context should be init first"; case PREFIX(memory_allocation): return "Allocation error : not enough = memory"; @@ -38,17 +41,23 @@ const char* ERR_getErrorString(ERR_enum code) case PREFIX(tableLog_tooLarge): return "tableLog requires too much mem= ory : unsupported"; case PREFIX(maxSymbolValue_tooLarge): return "Unsupported max Symbol V= alue : too large"; case PREFIX(maxSymbolValue_tooSmall): return "Specified maxSymbolValue= is too small"; + case PREFIX(cannotProduce_uncompressedBlock): return "This mode cannot= generate an uncompressed block"; + case PREFIX(stabilityCondition_notRespected): return "pledged buffer s= tability condition is not respected"; case PREFIX(dictionary_corrupted): return "Dictionary is corrupted"; case PREFIX(dictionary_wrong): return "Dictionary mismatch"; case PREFIX(dictionaryCreation_failed): return "Cannot create Dictiona= ry from provided samples"; case PREFIX(dstSize_tooSmall): return "Destination buffer is too small= "; case PREFIX(srcSize_wrong): return "Src size is incorrect"; case PREFIX(dstBuffer_null): return "Operation on NULL destination buf= fer"; + case PREFIX(noForwardProgress_destFull): return "Operation made no pro= gress over multiple calls, due to output buffer being full"; + case PREFIX(noForwardProgress_inputEmpty): return "Operation made no p= rogress over multiple calls, due to input being empty"; /* following error codes are not stable and may be removed or chan= ged in a future version */ case PREFIX(frameIndex_tooLarge): return "Frame index is too large"; case PREFIX(seekableIO): return "An I/O error occurred when reading/se= eking"; case PREFIX(dstBuffer_wrong): return "Destination buffer is wrong"; case PREFIX(srcBuffer_wrong): return "Source buffer is wrong"; + case PREFIX(sequenceProducer_failed): return "Block-level external seq= uence producer returned an error code"; + case PREFIX(externalSequences_invalid): return "External sequences are= not valid"; case PREFIX(maxCode): default: return notErrorCode; } diff --git a/lib/zstd/common/error_private.h b/lib/zstd/common/error_privat= e.h index ca5101e542fa..08ee87b68cca 100644 --- a/lib/zstd/common/error_private.h +++ b/lib/zstd/common/error_private.h @@ -1,5 +1,6 @@ +/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */ /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -13,8 +14,6 @@ #ifndef ERROR_H_MODULE #define ERROR_H_MODULE =20 - - /* **************************************** * Dependencies ******************************************/ @@ -23,7 +22,6 @@ #include "debug.h" #include "zstd_deps.h" /* size_t */ =20 - /* **************************************** * Compiler-specific ******************************************/ @@ -49,8 +47,13 @@ ERR_STATIC unsigned ERR_isError(size_t code) { return (c= ode > ERROR(maxCode)); } ERR_STATIC ERR_enum ERR_getErrorCode(size_t code) { if (!ERR_isError(code)= ) return (ERR_enum)0; return (ERR_enum) (0-code); } =20 /* check and forward error code */ -#define CHECK_V_F(e, f) size_t const e =3D f; if (ERR_isError(e)) return e -#define CHECK_F(f) { CHECK_V_F(_var_err__, f); } +#define CHECK_V_F(e, f) \ + size_t const e =3D f; \ + do { \ + if (ERR_isError(e)) \ + return e; \ + } while (0) +#define CHECK_F(f) do { CHECK_V_F(_var_err__, f); } while (0) =20 =20 /*-**************************************** @@ -84,10 +87,12 @@ void _force_has_format_string(const char *format, ...) { * We want to force this function invocation to be syntactically correct, = but * we don't want to force runtime evaluation of its arguments. */ -#define _FORCE_HAS_FORMAT_STRING(...) \ - if (0) { \ - _force_has_format_string(__VA_ARGS__); \ - } +#define _FORCE_HAS_FORMAT_STRING(...) \ + do { \ + if (0) { \ + _force_has_format_string(__VA_ARGS__); \ + } \ + } while (0) =20 #define ERR_QUOTE(str) #str =20 @@ -98,48 +103,49 @@ void _force_has_format_string(const char *format, ...)= { * In order to do that (particularly, printing the conditional that failed= ), * this can't just wrap RETURN_ERROR(). */ -#define RETURN_ERROR_IF(cond, err, ...) \ - if (cond) { \ - RAWLOG(3, "%s:%d: ERROR!: check %s failed, returning %s", \ - __FILE__, __LINE__, ERR_QUOTE(cond), ERR_QUOTE(ERROR(err))); \ - _FORCE_HAS_FORMAT_STRING(__VA_ARGS__); \ - RAWLOG(3, ": " __VA_ARGS__); \ - RAWLOG(3, "\n"); \ - return ERROR(err); \ - } +#define RETURN_ERROR_IF(cond, err, ...) = \ + do { = \ + if (cond) { = \ + RAWLOG(3, "%s:%d: ERROR!: check %s failed, returning %s", = \ + __FILE__, __LINE__, ERR_QUOTE(cond), ERR_QUOTE(ERROR(err= ))); \ + _FORCE_HAS_FORMAT_STRING(__VA_ARGS__); = \ + RAWLOG(3, ": " __VA_ARGS__); = \ + RAWLOG(3, "\n"); = \ + return ERROR(err); = \ + } = \ + } while (0) =20 /* * Unconditionally return the specified error. * * In debug modes, prints additional information. */ -#define RETURN_ERROR(err, ...) \ - do { \ - RAWLOG(3, "%s:%d: ERROR!: unconditional check failed, returning %s", \ - __FILE__, __LINE__, ERR_QUOTE(ERROR(err))); \ - _FORCE_HAS_FORMAT_STRING(__VA_ARGS__); \ - RAWLOG(3, ": " __VA_ARGS__); \ - RAWLOG(3, "\n"); \ - return ERROR(err); \ - } while(0); +#define RETURN_ERROR(err, ...) = \ + do { = \ + RAWLOG(3, "%s:%d: ERROR!: unconditional check failed, returning %s= ", \ + __FILE__, __LINE__, ERR_QUOTE(ERROR(err))); = \ + _FORCE_HAS_FORMAT_STRING(__VA_ARGS__); = \ + RAWLOG(3, ": " __VA_ARGS__); = \ + RAWLOG(3, "\n"); = \ + return ERROR(err); = \ + } while(0) =20 /* * If the provided expression evaluates to an error code, returns that err= or code. * * In debug modes, prints additional information. */ -#define FORWARD_IF_ERROR(err, ...) \ - do { \ - size_t const err_code =3D (err); \ - if (ERR_isError(err_code)) { \ - RAWLOG(3, "%s:%d: ERROR!: forwarding error in %s: %s", \ - __FILE__, __LINE__, ERR_QUOTE(err), ERR_getErrorName(err_code= )); \ - _FORCE_HAS_FORMAT_STRING(__VA_ARGS__); \ - RAWLOG(3, ": " __VA_ARGS__); \ - RAWLOG(3, "\n"); \ - return err_code; \ - } \ - } while(0); - +#define FORWARD_IF_ERROR(err, ...) = \ + do { = \ + size_t const err_code =3D (err); = \ + if (ERR_isError(err_code)) { = \ + RAWLOG(3, "%s:%d: ERROR!: forwarding error in %s: %s", = \ + __FILE__, __LINE__, ERR_QUOTE(err), ERR_getErrorName(err= _code)); \ + _FORCE_HAS_FORMAT_STRING(__VA_ARGS__); = \ + RAWLOG(3, ": " __VA_ARGS__); = \ + RAWLOG(3, "\n"); = \ + return err_code; = \ + } = \ + } while(0) =20 #endif /* ERROR_H_MODULE */ diff --git a/lib/zstd/common/fse.h b/lib/zstd/common/fse.h index 4507043b2287..b36ce7a2a8c3 100644 --- a/lib/zstd/common/fse.h +++ b/lib/zstd/common/fse.h @@ -1,7 +1,8 @@ +/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */ /* ****************************************************************** * FSE : Finite State Entropy codec * Public Prototypes declaration - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * * You can contact the author at : * - Source repository : https://github.com/Cyan4973/FiniteStateEntropy @@ -11,8 +12,6 @@ * in the COPYING file in the root directory of this source tree). * You may select, at your option, one of the above-listed licenses. ****************************************************************** */ - - #ifndef FSE_H #define FSE_H =20 @@ -22,7 +21,6 @@ ******************************************/ #include "zstd_deps.h" /* size_t, ptrdiff_t */ =20 - /*-***************************************** * FSE_PUBLIC_API : control library symbols visibility ******************************************/ @@ -50,34 +48,6 @@ FSE_PUBLIC_API unsigned FSE_versionNumber(void); /*< library version num= ber; to be used when checking dll version */ =20 =20 -/*-**************************************** -* FSE simple functions -******************************************/ -/*! FSE_compress() : - Compress content of buffer 'src', of size 'srcSize', into destination = buffer 'dst'. - 'dst' buffer must be already allocated. Compression runs faster is dst= Capacity >=3D FSE_compressBound(srcSize). - @return : size of compressed data (<=3D dstCapacity). - Special values : if return =3D=3D 0, srcData is not compressible =3D> = Nothing is stored within dst !!! - if return =3D=3D 1, srcData is a single byte symbol *= srcSize times. Use RLE compression instead. - if FSE_isError(return), compression failed (more deta= ils using FSE_getErrorName()) -*/ -FSE_PUBLIC_API size_t FSE_compress(void* dst, size_t dstCapacity, - const void* src, size_t srcSize); - -/*! FSE_decompress(): - Decompress FSE data from buffer 'cSrc', of size 'cSrcSize', - into already allocated destination buffer 'dst', of size 'dstCapacity'. - @return : size of regenerated data (<=3D maxDstSize), - or an error code, which can be tested using FSE_isError() . - - ** Important ** : FSE_decompress() does not decompress non-compressibl= e nor RLE data !!! - Why ? : making this distinction requires a header. - Header management is intentionally delegated to the user layer, which = can better manage special cases. -*/ -FSE_PUBLIC_API size_t FSE_decompress(void* dst, size_t dstCapacity, - const void* cSrc, size_t cSrcSize); - - /*-***************************************** * Tool functions ******************************************/ @@ -88,20 +58,6 @@ FSE_PUBLIC_API unsigned FSE_isError(size_t code); = /* tells if a return FSE_PUBLIC_API const char* FSE_getErrorName(size_t code); /* provides er= ror code string (useful for debugging) */ =20 =20 -/*-***************************************** -* FSE advanced functions -******************************************/ -/*! FSE_compress2() : - Same as FSE_compress(), but allows the selection of 'maxSymbolValue' a= nd 'tableLog' - Both parameters can be defined as '0' to mean : use default value - @return : size of compressed data - Special values : if return =3D=3D 0, srcData is not compressible =3D> = Nothing is stored within cSrc !!! - if return =3D=3D 1, srcData is a single byte symbol *= srcSize times. Use RLE compression. - if FSE_isError(return), it's an error code. -*/ -FSE_PUBLIC_API size_t FSE_compress2 (void* dst, size_t dstSize, const void= * src, size_t srcSize, unsigned maxSymbolValue, unsigned tableLog); - - /*-***************************************** * FSE detailed API ******************************************/ @@ -161,8 +117,6 @@ FSE_PUBLIC_API size_t FSE_writeNCount (void* buffer, si= ze_t bufferSize, /*! Constructor and Destructor of FSE_CTable. Note that FSE_CTable size depends on 'tableLog' and 'maxSymbolValue' */ typedef unsigned FSE_CTable; /* don't allocate that. It's only meant to = be more restrictive than void* */ -FSE_PUBLIC_API FSE_CTable* FSE_createCTable (unsigned maxSymbolValue, unsi= gned tableLog); -FSE_PUBLIC_API void FSE_freeCTable (FSE_CTable* ct); =20 /*! FSE_buildCTable(): Builds `ct`, which must be already allocated, using FSE_createCTable(). @@ -238,23 +192,7 @@ FSE_PUBLIC_API size_t FSE_readNCount_bmi2(short* norma= lizedCounter, unsigned* maxSymbolValuePtr, unsigned* tableLog= Ptr, const void* rBuffer, size_t rBuffSize, int bmi2= ); =20 -/*! Constructor and Destructor of FSE_DTable. - Note that its size depends on 'tableLog' */ typedef unsigned FSE_DTable; /* don't allocate that. It's just a way to = be more restrictive than void* */ -FSE_PUBLIC_API FSE_DTable* FSE_createDTable(unsigned tableLog); -FSE_PUBLIC_API void FSE_freeDTable(FSE_DTable* dt); - -/*! FSE_buildDTable(): - Builds 'dt', which must be already allocated, using FSE_createDTable(). - return : 0, or an errorCode, which can be tested using FSE_isError() */ -FSE_PUBLIC_API size_t FSE_buildDTable (FSE_DTable* dt, const short* normal= izedCounter, unsigned maxSymbolValue, unsigned tableLog); - -/*! FSE_decompress_usingDTable(): - Decompress compressed source `cSrc` of size `cSrcSize` using `dt` - into `dst` which must be already allocated. - @return : size of regenerated data (necessarily <=3D `dstCapacity`), - or an errorCode, which can be tested using FSE_isError() */ -FSE_PUBLIC_API size_t FSE_decompress_usingDTable(void* dst, size_t dstCapa= city, const void* cSrc, size_t cSrcSize, const FSE_DTable* dt); =20 /*! Tutorial : @@ -286,13 +224,11 @@ If there is an error, the function will return an err= or code, which can be teste =20 #endif /* FSE_H */ =20 + #if !defined(FSE_H_FSE_STATIC_LINKING_ONLY) #define FSE_H_FSE_STATIC_LINKING_ONLY - -/* *** Dependency *** */ #include "bitstream.h" =20 - /* ***************************************** * Static allocation *******************************************/ @@ -317,16 +253,6 @@ If there is an error, the function will return an erro= r code, which can be teste unsigned FSE_optimalTableLog_internal(unsigned maxTableLog, size_t srcSize= , unsigned maxSymbolValue, unsigned minus); /*< same as FSE_optimalTableLog(), which used `minus=3D=3D2` */ =20 -/* FSE_compress_wksp() : - * Same as FSE_compress2(), but using an externally allocated scratch buff= er (`workSpace`). - * FSE_COMPRESS_WKSP_SIZE_U32() provides the minimum size required for `wo= rkSpace` as a table of FSE_CTable. - */ -#define FSE_COMPRESS_WKSP_SIZE_U32(maxTableLog, maxSymbolValue) ( FSE_CT= ABLE_SIZE_U32(maxTableLog, maxSymbolValue) + ((maxTableLog > 12) ? (1 << (m= axTableLog - 2)) : 1024) ) -size_t FSE_compress_wksp (void* dst, size_t dstSize, const void* src, size= _t srcSize, unsigned maxSymbolValue, unsigned tableLog, void* workSpace, si= ze_t wkspSize); - -size_t FSE_buildCTable_raw (FSE_CTable* ct, unsigned nbBits); -/*< build a fake FSE_CTable, designed for a flat distribution, where each = symbol uses nbBits */ - size_t FSE_buildCTable_rle (FSE_CTable* ct, unsigned char symbolValue); /*< build a fake FSE_CTable, designed to compress always the same symbolVa= lue */ =20 @@ -344,19 +270,11 @@ size_t FSE_buildCTable_wksp(FSE_CTable* ct, const sho= rt* normalizedCounter, unsi FSE_PUBLIC_API size_t FSE_buildDTable_wksp(FSE_DTable* dt, const short* no= rmalizedCounter, unsigned maxSymbolValue, unsigned tableLog, void* workSpac= e, size_t wkspSize); /*< Same as FSE_buildDTable(), using an externally allocated `workspace` p= roduced with `FSE_BUILD_DTABLE_WKSP_SIZE_U32(maxSymbolValue)` */ =20 -size_t FSE_buildDTable_raw (FSE_DTable* dt, unsigned nbBits); -/*< build a fake FSE_DTable, designed to read a flat distribution where ea= ch symbol uses nbBits */ - -size_t FSE_buildDTable_rle (FSE_DTable* dt, unsigned char symbolValue); -/*< build a fake FSE_DTable, designed to always generate the same symbolVa= lue */ - -#define FSE_DECOMPRESS_WKSP_SIZE_U32(maxTableLog, maxSymbolValue) (FSE_DTA= BLE_SIZE_U32(maxTableLog) + FSE_BUILD_DTABLE_WKSP_SIZE_U32(maxTableLog, max= SymbolValue) + (FSE_MAX_SYMBOL_VALUE + 1) / 2 + 1) +#define FSE_DECOMPRESS_WKSP_SIZE_U32(maxTableLog, maxSymbolValue) (FSE_DTA= BLE_SIZE_U32(maxTableLog) + 1 + FSE_BUILD_DTABLE_WKSP_SIZE_U32(maxTableLog,= maxSymbolValue) + (FSE_MAX_SYMBOL_VALUE + 1) / 2 + 1) #define FSE_DECOMPRESS_WKSP_SIZE(maxTableLog, maxSymbolValue) (FSE_DECOMPR= ESS_WKSP_SIZE_U32(maxTableLog, maxSymbolValue) * sizeof(unsigned)) -size_t FSE_decompress_wksp(void* dst, size_t dstCapacity, const void* cSrc= , size_t cSrcSize, unsigned maxLog, void* workSpace, size_t wkspSize); -/*< same as FSE_decompress(), using an externally allocated `workSpace` pr= oduced with `FSE_DECOMPRESS_WKSP_SIZE_U32(maxLog, maxSymbolValue)` */ - size_t FSE_decompress_wksp_bmi2(void* dst, size_t dstCapacity, const void*= cSrc, size_t cSrcSize, unsigned maxLog, void* workSpace, size_t wkspSize, = int bmi2); -/*< Same as FSE_decompress_wksp() but with dynamic BMI2 support. Pass 1 if= your CPU supports BMI2 or 0 if it doesn't. */ +/*< same as FSE_decompress(), using an externally allocated `workSpace` pr= oduced with `FSE_DECOMPRESS_WKSP_SIZE_U32(maxLog, maxSymbolValue)`. + * Set bmi2 to 1 if your CPU supports BMI2 or 0 if it doesn't */ =20 typedef enum { FSE_repeat_none, /*< Cannot use the previous table */ @@ -539,20 +457,20 @@ MEM_STATIC void FSE_encodeSymbol(BIT_CStream_t* bitC,= FSE_CState_t* statePtr, un FSE_symbolCompressionTransform const symbolTT =3D ((const FSE_symbolCo= mpressionTransform*)(statePtr->symbolTT))[symbol]; const U16* const stateTable =3D (const U16*)(statePtr->stateTable); U32 const nbBitsOut =3D (U32)((statePtr->value + symbolTT.deltaNbBits= ) >> 16); - BIT_addBits(bitC, statePtr->value, nbBitsOut); + BIT_addBits(bitC, (BitContainerType)statePtr->value, nbBitsOut); statePtr->value =3D stateTable[ (statePtr->value >> nbBitsOut) + symbo= lTT.deltaFindState]; } =20 MEM_STATIC void FSE_flushCState(BIT_CStream_t* bitC, const FSE_CState_t* s= tatePtr) { - BIT_addBits(bitC, statePtr->value, statePtr->stateLog); + BIT_addBits(bitC, (BitContainerType)statePtr->value, statePtr->stateLo= g); BIT_flushBits(bitC); } =20 =20 /* FSE_getMaxNbBits() : * Approximate maximum cost of a symbol, in bits. - * Fractional get rounded up (i.e : a symbol with a normalized frequency o= f 3 gives the same result as a frequency of 2) + * Fractional get rounded up (i.e. a symbol with a normalized frequency of= 3 gives the same result as a frequency of 2) * note 1 : assume symbolValue is valid (<=3D maxSymbolValue) * note 2 : if freq[symbolValue]=3D=3D0, @return a fake cost of tableLog+1= bits */ MEM_STATIC U32 FSE_getMaxNbBits(const void* symbolTTPtr, U32 symbolValue) @@ -705,7 +623,4 @@ MEM_STATIC unsigned FSE_endOfDState(const FSE_DState_t*= DStatePtr) =20 #define FSE_TABLESTEP(tableSize) (((tableSize)>>1) + ((tableSize)>>3) + 3) =20 - #endif /* FSE_STATIC_LINKING_ONLY */ - - diff --git a/lib/zstd/common/fse_decompress.c b/lib/zstd/common/fse_decompr= ess.c index 8dcb8ca39767..15081d8dc607 100644 --- a/lib/zstd/common/fse_decompress.c +++ b/lib/zstd/common/fse_decompress.c @@ -1,6 +1,7 @@ +// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause /* ****************************************************************** * FSE : Finite State Entropy decoder - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * * You can contact the author at : * - FSE source repository : https://github.com/Cyan4973/FiniteStateEntro= py @@ -22,8 +23,8 @@ #define FSE_STATIC_LINKING_ONLY #include "fse.h" #include "error_private.h" -#define ZSTD_DEPS_NEED_MALLOC -#include "zstd_deps.h" +#include "zstd_deps.h" /* ZSTD_memcpy */ +#include "bits.h" /* ZSTD_highbit32 */ =20 =20 /* ************************************************************** @@ -55,19 +56,6 @@ #define FSE_FUNCTION_NAME(X,Y) FSE_CAT(X,Y) #define FSE_TYPE_NAME(X,Y) FSE_CAT(X,Y) =20 - -/* Function templates */ -FSE_DTable* FSE_createDTable (unsigned tableLog) -{ - if (tableLog > FSE_TABLELOG_ABSOLUTE_MAX) tableLog =3D FSE_TABLELOG_AB= SOLUTE_MAX; - return (FSE_DTable*)ZSTD_malloc( FSE_DTABLE_SIZE_U32(tableLog) * sizeo= f (U32) ); -} - -void FSE_freeDTable (FSE_DTable* dt) -{ - ZSTD_free(dt); -} - static size_t FSE_buildDTable_internal(FSE_DTable* dt, const short* normal= izedCounter, unsigned maxSymbolValue, unsigned tableLog, void* workSpace, s= ize_t wkspSize) { void* const tdPtr =3D dt+1; /* because *dt is unsigned, 32-bits alig= ned on 32-bits */ @@ -96,7 +84,7 @@ static size_t FSE_buildDTable_internal(FSE_DTable* dt, co= nst short* normalizedCo symbolNext[s] =3D 1; } else { if (normalizedCounter[s] >=3D largeLimit) DTableH.fast= Mode=3D0; - symbolNext[s] =3D normalizedCounter[s]; + symbolNext[s] =3D (U16)normalizedCounter[s]; } } } ZSTD_memcpy(dt, &DTableH, sizeof(DTableH)); } @@ -111,8 +99,7 @@ static size_t FSE_buildDTable_internal(FSE_DTable* dt, c= onst short* normalizedCo * all symbols have counts <=3D 8. We ensure we have 8 bytes at th= e end of * our buffer to handle the over-write. */ - { - U64 const add =3D 0x0101010101010101ull; + { U64 const add =3D 0x0101010101010101ull; size_t pos =3D 0; U64 sv =3D 0; U32 s; @@ -123,14 +110,13 @@ static size_t FSE_buildDTable_internal(FSE_DTable* dt= , const short* normalizedCo for (i =3D 8; i < n; i +=3D 8) { MEM_write64(spread + pos + i, sv); } - pos +=3D n; - } - } + pos +=3D (size_t)n; + } } /* Now we spread those positions across the table. - * The benefit of doing it in two stages is that we avoid the the + * The benefit of doing it in two stages is that we avoid the * variable size inner loop, which caused lots of branch misses. * Now we can run through all the positions without any branch mis= ses. - * We unroll the loop twice, since that is what emperically worked= best. + * We unroll the loop twice, since that is what empirically worked= best. */ { size_t position =3D 0; @@ -166,7 +152,7 @@ static size_t FSE_buildDTable_internal(FSE_DTable* dt, = const short* normalizedCo for (u=3D0; utableLog =3D 0; - DTableH->fastMode =3D 0; - - cell->newState =3D 0; - cell->symbol =3D symbolValue; - cell->nbBits =3D 0; - - return 0; -} - - -size_t FSE_buildDTable_raw (FSE_DTable* dt, unsigned nbBits) -{ - void* ptr =3D dt; - FSE_DTableHeader* const DTableH =3D (FSE_DTableHeader*)ptr; - void* dPtr =3D dt + 1; - FSE_decode_t* const dinfo =3D (FSE_decode_t*)dPtr; - const unsigned tableSize =3D 1 << nbBits; - const unsigned tableMask =3D tableSize - 1; - const unsigned maxSV1 =3D tableMask+1; - unsigned s; - - /* Sanity checks */ - if (nbBits < 1) return ERROR(GENERIC); /* min size */ - - /* Build Decoding Table */ - DTableH->tableLog =3D (U16)nbBits; - DTableH->fastMode =3D 1; - for (s=3D0; sfastMode; - - /* select fast mode (static) */ - if (fastMode) return FSE_decompress_usingDTable_generic(dst, originalS= ize, cSrc, cSrcSize, dt, 1); - return FSE_decompress_usingDTable_generic(dst, originalSize, cSrc, cSr= cSize, dt, 0); -} - - -size_t FSE_decompress_wksp(void* dst, size_t dstCapacity, const void* cSrc= , size_t cSrcSize, unsigned maxLog, void* workSpace, size_t wkspSize) -{ - return FSE_decompress_wksp_bmi2(dst, dstCapacity, cSrc, cSrcSize, maxL= og, workSpace, wkspSize, /* bmi2 */ 0); + assert(op >=3D ostart); + return (size_t)(op-ostart); } =20 typedef struct { short ncount[FSE_MAX_SYMBOL_VALUE + 1]; - FSE_DTable dtable[]; /* Dynamically sized */ } FSE_DecompressWksp; =20 =20 @@ -327,13 +252,18 @@ FORCE_INLINE_TEMPLATE size_t FSE_decompress_wksp_body( unsigned tableLog; unsigned maxSymbolValue =3D FSE_MAX_SYMBOL_VALUE; FSE_DecompressWksp* const wksp =3D (FSE_DecompressWksp*)workSpace; + size_t const dtablePos =3D sizeof(FSE_DecompressWksp) / sizeof(FSE_DTa= ble); + FSE_DTable* const dtable =3D (FSE_DTable*)workSpace + dtablePos; =20 - DEBUG_STATIC_ASSERT((FSE_MAX_SYMBOL_VALUE + 1) % 2 =3D=3D 0); + FSE_STATIC_ASSERT((FSE_MAX_SYMBOL_VALUE + 1) % 2 =3D=3D 0); if (wkspSize < sizeof(*wksp)) return ERROR(GENERIC); =20 + /* correct offset to dtable depends on this property */ + FSE_STATIC_ASSERT(sizeof(FSE_DecompressWksp) % sizeof(FSE_DTable) =3D= =3D 0); + /* normal FSE decoding mode */ - { - size_t const NCountLength =3D FSE_readNCount_bmi2(wksp->ncount, &m= axSymbolValue, &tableLog, istart, cSrcSize, bmi2); + { size_t const NCountLength =3D + FSE_readNCount_bmi2(wksp->ncount, &maxSymbolValue, &tableLog, = istart, cSrcSize, bmi2); if (FSE_isError(NCountLength)) return NCountLength; if (tableLog > maxLog) return ERROR(tableLog_tooLarge); assert(NCountLength <=3D cSrcSize); @@ -342,19 +272,20 @@ FORCE_INLINE_TEMPLATE size_t FSE_decompress_wksp_body( } =20 if (FSE_DECOMPRESS_WKSP_SIZE(tableLog, maxSymbolValue) > wkspSize) ret= urn ERROR(tableLog_tooLarge); - workSpace =3D wksp->dtable + FSE_DTABLE_SIZE_U32(tableLog); + assert(sizeof(*wksp) + FSE_DTABLE_SIZE(tableLog) <=3D wkspSize); + workSpace =3D (BYTE*)workSpace + sizeof(*wksp) + FSE_DTABLE_SIZE(table= Log); wkspSize -=3D sizeof(*wksp) + FSE_DTABLE_SIZE(tableLog); =20 - CHECK_F( FSE_buildDTable_internal(wksp->dtable, wksp->ncount, maxSymbo= lValue, tableLog, workSpace, wkspSize) ); + CHECK_F( FSE_buildDTable_internal(dtable, wksp->ncount, maxSymbolValue= , tableLog, workSpace, wkspSize) ); =20 { - const void* ptr =3D wksp->dtable; + const void* ptr =3D dtable; const FSE_DTableHeader* DTableH =3D (const FSE_DTableHeader*)ptr; const U32 fastMode =3D DTableH->fastMode; =20 /* select fast mode (static) */ - if (fastMode) return FSE_decompress_usingDTable_generic(dst, dstCa= pacity, ip, cSrcSize, wksp->dtable, 1); - return FSE_decompress_usingDTable_generic(dst, dstCapacity, ip, cS= rcSize, wksp->dtable, 0); + if (fastMode) return FSE_decompress_usingDTable_generic(dst, dstCa= pacity, ip, cSrcSize, dtable, 1); + return FSE_decompress_usingDTable_generic(dst, dstCapacity, ip, cS= rcSize, dtable, 0); } } =20 @@ -382,9 +313,4 @@ size_t FSE_decompress_wksp_bmi2(void* dst, size_t dstCa= pacity, const void* cSrc, return FSE_decompress_wksp_body_default(dst, dstCapacity, cSrc, cSrcSi= ze, maxLog, workSpace, wkspSize); } =20 - -typedef FSE_DTable DTable_max_t[FSE_DTABLE_SIZE_U32(FSE_MAX_TABLELOG)]; - - - #endif /* FSE_COMMONDEFS_ONLY */ diff --git a/lib/zstd/common/huf.h b/lib/zstd/common/huf.h index 5042ff870308..49736dcd8f49 100644 --- a/lib/zstd/common/huf.h +++ b/lib/zstd/common/huf.h @@ -1,7 +1,8 @@ +/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */ /* ****************************************************************** * huff0 huffman codec, * part of Finite State Entropy library - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * * You can contact the author at : * - Source repository : https://github.com/Cyan4973/FiniteStateEntropy @@ -12,105 +13,26 @@ * You may select, at your option, one of the above-listed licenses. ****************************************************************** */ =20 - #ifndef HUF_H_298734234 #define HUF_H_298734234 =20 /* *** Dependencies *** */ #include "zstd_deps.h" /* size_t */ - - -/* *** library symbols visibility *** */ -/* Note : when linking with -fvisibility=3Dhidden on gcc, or by default on= Visual, - * HUF symbols remain "private" (internal symbols for library only). - * Set macro FSE_DLL_EXPORT to 1 if you want HUF symbols visible on= DLL interface */ -#if defined(FSE_DLL_EXPORT) && (FSE_DLL_EXPORT=3D=3D1) && defined(__GNUC__= ) && (__GNUC__ >=3D 4) -# define HUF_PUBLIC_API __attribute__ ((visibility ("default"))) -#elif defined(FSE_DLL_EXPORT) && (FSE_DLL_EXPORT=3D=3D1) /* Visual expec= ted */ -# define HUF_PUBLIC_API __declspec(dllexport) -#elif defined(FSE_DLL_IMPORT) && (FSE_DLL_IMPORT=3D=3D1) -# define HUF_PUBLIC_API __declspec(dllimport) /* not required, just to g= enerate faster code (saves a function pointer load from IAT and an indirect= jump) */ -#else -# define HUF_PUBLIC_API -#endif - - -/* =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D */ -/* *** simple functions *** */ -/* =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D */ - -/* HUF_compress() : - * Compress content from buffer 'src', of size 'srcSize', into buffer 'ds= t'. - * 'dst' buffer must be already allocated. - * Compression runs faster if `dstCapacity` >=3D HUF_compressBound(srcSiz= e). - * `srcSize` must be <=3D `HUF_BLOCKSIZE_MAX` =3D=3D 128 KB. - * @return : size of compressed data (<=3D `dstCapacity`). - * Special values : if return =3D=3D 0, srcData is not compressible =3D> = Nothing is stored within dst !!! - * if HUF_isError(return), compression failed (more deta= ils using HUF_getErrorName()) - */ -HUF_PUBLIC_API size_t HUF_compress(void* dst, size_t dstCapacity, - const void* src, size_t srcSize); - -/* HUF_decompress() : - * Decompress HUF data from buffer 'cSrc', of size 'cSrcSize', - * into already allocated buffer 'dst', of minimum size 'dstSize'. - * `originalSize` : **must** be the ***exact*** size of original (uncompre= ssed) data. - * Note : in contrast with FSE, HUF_decompress can regenerate - * RLE (cSrcSize=3D=3D1) and uncompressed (cSrcSize=3D=3DdstSize) = data, - * because it knows size to regenerate (originalSize). - * @return : size of regenerated data (=3D=3D originalSize), - * or an error code, which can be tested using HUF_isError() - */ -HUF_PUBLIC_API size_t HUF_decompress(void* dst, size_t originalSize, - const void* cSrc, size_t cSrcSize); - +#include "mem.h" /* U32 */ +#define FSE_STATIC_LINKING_ONLY +#include "fse.h" =20 /* *** Tool functions *** */ -#define HUF_BLOCKSIZE_MAX (128 * 1024) /*< maximum input = size for a single block compressed with HUF_compress */ -HUF_PUBLIC_API size_t HUF_compressBound(size_t size); /*< maximum compre= ssed size (worst case) */ +#define HUF_BLOCKSIZE_MAX (128 * 1024) /*< maximum input size for a sing= le block compressed with HUF_compress */ +size_t HUF_compressBound(size_t size); /*< maximum compressed size (wors= t case) */ =20 /* Error Management */ -HUF_PUBLIC_API unsigned HUF_isError(size_t code); /*< tells if a = return value is an error code */ -HUF_PUBLIC_API const char* HUF_getErrorName(size_t code); /*< provides er= ror code string (useful for debugging) */ +unsigned HUF_isError(size_t code); /*< tells if a return value is= an error code */ +const char* HUF_getErrorName(size_t code); /*< provides error code string= (useful for debugging) */ =20 =20 -/* *** Advanced function *** */ - -/* HUF_compress2() : - * Same as HUF_compress(), but offers control over `maxSymbolValue` and `= tableLog`. - * `maxSymbolValue` must be <=3D HUF_SYMBOLVALUE_MAX . - * `tableLog` must be `<=3D HUF_TABLELOG_MAX` . */ -HUF_PUBLIC_API size_t HUF_compress2 (void* dst, size_t dstCapacity, - const void* src, size_t srcSize, - unsigned maxSymbolValue, unsigned tableLog); - -/* HUF_compress4X_wksp() : - * Same as HUF_compress2(), but uses externally allocated `workSpace`. - * `workspace` must be at least as large as HUF_WORKSPACE_SIZE */ #define HUF_WORKSPACE_SIZE ((8 << 10) + 512 /* sorting scratch space */) #define HUF_WORKSPACE_SIZE_U64 (HUF_WORKSPACE_SIZE / sizeof(U64)) -HUF_PUBLIC_API size_t HUF_compress4X_wksp (void* dst, size_t dstCapacity, - const void* src, size_t srcSize, - unsigned maxSymbolValue, unsigned tab= leLog, - void* workSpace, size_t wkspSize); - -#endif /* HUF_H_298734234 */ - -/* ****************************************************************** - * WARNING !! - * The following section contains advanced and experimental definitions - * which shall never be used in the context of a dynamic library, - * because they are not guaranteed to remain stable in the future. - * Only consider them in association with static linking. - * *****************************************************************/ -#if !defined(HUF_H_HUF_STATIC_LINKING_ONLY) -#define HUF_H_HUF_STATIC_LINKING_ONLY - -/* *** Dependencies *** */ -#include "mem.h" /* U32 */ -#define FSE_STATIC_LINKING_ONLY -#include "fse.h" - =20 /* *** Constants *** */ #define HUF_TABLELOG_MAX 12 /* max runtime value of tableLog (du= e to static allocation); can be modified up to HUF_TABLELOG_ABSOLUTEMAX */ @@ -151,25 +73,49 @@ typedef U32 HUF_DTable; /* **************************************** * Advanced decompression functions ******************************************/ -size_t HUF_decompress4X1 (void* dst, size_t dstSize, const void* cSrc, siz= e_t cSrcSize); /*< single-symbol decoder */ -#ifndef HUF_FORCE_DECOMPRESS_X1 -size_t HUF_decompress4X2 (void* dst, size_t dstSize, const void* cSrc, siz= e_t cSrcSize); /*< double-symbols decoder */ -#endif =20 -size_t HUF_decompress4X_DCtx (HUF_DTable* dctx, void* dst, size_t dstSize,= const void* cSrc, size_t cSrcSize); /*< decodes RLE and uncompressed */ -size_t HUF_decompress4X_hufOnly(HUF_DTable* dctx, void* dst, size_t dstSiz= e, const void* cSrc, size_t cSrcSize); /*< considers RLE and uncompressed a= s errors */ -size_t HUF_decompress4X_hufOnly_wksp(HUF_DTable* dctx, void* dst, size_t d= stSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize= ); /*< considers RLE and uncompressed as errors */ -size_t HUF_decompress4X1_DCtx(HUF_DTable* dctx, void* dst, size_t dstSize,= const void* cSrc, size_t cSrcSize); /*< single-symbol decoder */ -size_t HUF_decompress4X1_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dst= Size, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize);= /*< single-symbol decoder */ -#ifndef HUF_FORCE_DECOMPRESS_X1 -size_t HUF_decompress4X2_DCtx(HUF_DTable* dctx, void* dst, size_t dstSize,= const void* cSrc, size_t cSrcSize); /*< double-symbols decoder */ -size_t HUF_decompress4X2_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dst= Size, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize);= /*< double-symbols decoder */ -#endif +/* + * Huffman flags bitset. + * For all flags, 0 is the default value. + */ +typedef enum { + /* + * If compiled with DYNAMIC_BMI2: Set flag only if the CPU supports BM= I2 at runtime. + * Otherwise: Ignored. + */ + HUF_flags_bmi2 =3D (1 << 0), + /* + * If set: Test possible table depths to find the one that produces th= e smallest header + encoded size. + * If unset: Use heuristic to find the table depth. + */ + HUF_flags_optimalDepth =3D (1 << 1), + /* + * If set: If the previous table can encode the input, always reuse th= e previous table. + * If unset: If the previous table can encode the input, reuse the pre= vious table if it results in a smaller output. + */ + HUF_flags_preferRepeat =3D (1 << 2), + /* + * If set: Sample the input and check if the sample is uncompressible,= if it is then don't attempt to compress. + * If unset: Always histogram the entire input. + */ + HUF_flags_suspectUncompressible =3D (1 << 3), + /* + * If set: Don't use assembly implementations + * If unset: Allow using assembly implementations + */ + HUF_flags_disableAsm =3D (1 << 4), + /* + * If set: Don't use the fast decoding loop, always use the fallback d= ecoding loop. + * If unset: Use the fast decoding loop when possible. + */ + HUF_flags_disableFast =3D (1 << 5) +} HUF_flags_e; =20 =20 /* **************************************** * HUF detailed API * ****************************************/ +#define HUF_OPTIMAL_DEPTH_THRESHOLD ZSTD_btultra =20 /*! HUF_compress() does the following: * 1. count symbol occurrence from source[] into table count[] using FSE_= count() (exposed within "fse.h") @@ -182,12 +128,12 @@ size_t HUF_decompress4X2_DCtx_wksp(HUF_DTable* dctx, = void* dst, size_t dstSize, * For example, it's possible to compress several blocks using the same '= CTable', * or to save and regenerate 'CTable' using external methods. */ -unsigned HUF_optimalTableLog(unsigned maxTableLog, size_t srcSize, unsigne= d maxSymbolValue); -size_t HUF_buildCTable (HUF_CElt* CTable, const unsigned* count, unsigned = maxSymbolValue, unsigned maxNbBits); /* @return : maxNbBits; CTable and c= ount can overlap. In which case, CTable will overwrite count content */ -size_t HUF_writeCTable (void* dst, size_t maxDstSize, const HUF_CElt* CTab= le, unsigned maxSymbolValue, unsigned huffLog); +unsigned HUF_minTableLog(unsigned symbolCardinality); +unsigned HUF_cardinality(const unsigned* count, unsigned maxSymbolValue); +unsigned HUF_optimalTableLog(unsigned maxTableLog, size_t srcSize, unsigne= d maxSymbolValue, void* workSpace, + size_t wkspSize, HUF_CElt* table, const unsigned* count, int flags); /* t= able is used as scratch space for building and testing tables, not a return= value */ size_t HUF_writeCTable_wksp(void* dst, size_t maxDstSize, const HUF_CElt* = CTable, unsigned maxSymbolValue, unsigned huffLog, void* workspace, size_t = workspaceSize); -size_t HUF_compress4X_usingCTable(void* dst, size_t dstSize, const void* s= rc, size_t srcSize, const HUF_CElt* CTable); -size_t HUF_compress4X_usingCTable_bmi2(void* dst, size_t dstSize, const vo= id* src, size_t srcSize, const HUF_CElt* CTable, int bmi2); +size_t HUF_compress4X_usingCTable(void* dst, size_t dstSize, const void* s= rc, size_t srcSize, const HUF_CElt* CTable, int flags); size_t HUF_estimateCompressedSize(const HUF_CElt* CTable, const unsigned* = count, unsigned maxSymbolValue); int HUF_validateCTable(const HUF_CElt* CTable, const unsigned* count, unsi= gned maxSymbolValue); =20 @@ -196,6 +142,7 @@ typedef enum { HUF_repeat_check, /*< Can use the previous table but it must be checked= . Note : The previous table must have been constructed by HUF_compress{1, 4= }X_repeat */ HUF_repeat_valid /*< Can use the previous table and it is assumed to b= e valid */ } HUF_repeat; + /* HUF_compress4X_repeat() : * Same as HUF_compress4X_wksp(), but considers using hufTable if *repeat= !=3D HUF_repeat_none. * If it uses hufTable it does not modify hufTable or repeat. @@ -206,13 +153,13 @@ size_t HUF_compress4X_repeat(void* dst, size_t dstSiz= e, const void* src, size_t srcSize, unsigned maxSymbolValue, unsigned tableLog, void* workSpace, size_t wkspSize, /*< `workSpace= ` must be aligned on 4-bytes boundaries, `wkspSize` must be >=3D HUF_WORKSP= ACE_SIZE */ - HUF_CElt* hufTable, HUF_repeat* repeat, int preferR= epeat, int bmi2, unsigned suspectUncompressible); + HUF_CElt* hufTable, HUF_repeat* repeat, int flags); =20 /* HUF_buildCTable_wksp() : * Same as HUF_buildCTable(), but using externally allocated scratch buff= er. * `workSpace` must be aligned on 4-bytes boundaries, and its size must be= >=3D HUF_CTABLE_WORKSPACE_SIZE. */ -#define HUF_CTABLE_WORKSPACE_SIZE_U32 (2*HUF_SYMBOLVALUE_MAX +1 +1) +#define HUF_CTABLE_WORKSPACE_SIZE_U32 ((4 * (HUF_SYMBOLVALUE_MAX + 1)) + 1= 92) #define HUF_CTABLE_WORKSPACE_SIZE (HUF_CTABLE_WORKSPACE_SIZE_U32 * sizeof(= unsigned)) size_t HUF_buildCTable_wksp (HUF_CElt* tree, const unsigned* count, U32 maxSymbolValue, U32 maxN= bBits, @@ -238,7 +185,7 @@ size_t HUF_readStats_wksp(BYTE* huffWeight, size_t hwSi= ze, U32* rankStats, U32* nbSymbolsPtr, U32* tableLog= Ptr, const void* src, size_t srcSize, void* workspace, size_t wkspSize, - int bmi2); + int flags); =20 /* HUF_readCTable() : * Loading a CTable saved with HUF_writeCTable() */ @@ -246,9 +193,22 @@ size_t HUF_readCTable (HUF_CElt* CTable, unsigned* max= SymbolValuePtr, const void =20 /* HUF_getNbBitsFromCTable() : * Read nbBits from CTable symbolTable, for symbol `symbolValue` presumed= <=3D HUF_SYMBOLVALUE_MAX - * Note 1 : is not inlined, as HUF_CElt definition is private */ + * Note 1 : If symbolValue > HUF_readCTableHeader(symbolTable).maxSymbolV= alue, returns 0 + * Note 2 : is not inlined, as HUF_CElt definition is private + */ U32 HUF_getNbBitsFromCTable(const HUF_CElt* symbolTable, U32 symbolValue); =20 +typedef struct { + BYTE tableLog; + BYTE maxSymbolValue; + BYTE unused[sizeof(size_t) - 2]; +} HUF_CTableHeader; + +/* HUF_readCTableHeader() : + * @returns The header from the CTable specifying the tableLog and the ma= xSymbolValue. + */ +HUF_CTableHeader HUF_readCTableHeader(HUF_CElt const* ctable); + /* * HUF_decompress() does the following: * 1. select the decompression algorithm (X1, X2) based on pre-computed he= uristics @@ -276,32 +236,12 @@ U32 HUF_selectDecoder (size_t dstSize, size_t cSrcSiz= e); #define HUF_DECOMPRESS_WORKSPACE_SIZE ((2 << 10) + (1 << 9)) #define HUF_DECOMPRESS_WORKSPACE_SIZE_U32 (HUF_DECOMPRESS_WORKSPACE_SIZE /= sizeof(U32)) =20 -#ifndef HUF_FORCE_DECOMPRESS_X2 -size_t HUF_readDTableX1 (HUF_DTable* DTable, const void* src, size_t srcSi= ze); -size_t HUF_readDTableX1_wksp (HUF_DTable* DTable, const void* src, size_t = srcSize, void* workSpace, size_t wkspSize); -#endif -#ifndef HUF_FORCE_DECOMPRESS_X1 -size_t HUF_readDTableX2 (HUF_DTable* DTable, const void* src, size_t srcSi= ze); -size_t HUF_readDTableX2_wksp (HUF_DTable* DTable, const void* src, size_t = srcSize, void* workSpace, size_t wkspSize); -#endif - -size_t HUF_decompress4X_usingDTable(void* dst, size_t maxDstSize, const vo= id* cSrc, size_t cSrcSize, const HUF_DTable* DTable); -#ifndef HUF_FORCE_DECOMPRESS_X2 -size_t HUF_decompress4X1_usingDTable(void* dst, size_t maxDstSize, const v= oid* cSrc, size_t cSrcSize, const HUF_DTable* DTable); -#endif -#ifndef HUF_FORCE_DECOMPRESS_X1 -size_t HUF_decompress4X2_usingDTable(void* dst, size_t maxDstSize, const v= oid* cSrc, size_t cSrcSize, const HUF_DTable* DTable); -#endif - =20 /* =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D */ /* single stream variants */ /* =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D */ =20 -size_t HUF_compress1X (void* dst, size_t dstSize, const void* src, size_t = srcSize, unsigned maxSymbolValue, unsigned tableLog); -size_t HUF_compress1X_wksp (void* dst, size_t dstSize, const void* src, si= ze_t srcSize, unsigned maxSymbolValue, unsigned tableLog, void* workSpace, = size_t wkspSize); /*< `workSpace` must be a table of at least HUF_WORKSPAC= E_SIZE_U64 U64 */ -size_t HUF_compress1X_usingCTable(void* dst, size_t dstSize, const void* s= rc, size_t srcSize, const HUF_CElt* CTable); -size_t HUF_compress1X_usingCTable_bmi2(void* dst, size_t dstSize, const vo= id* src, size_t srcSize, const HUF_CElt* CTable, int bmi2); +size_t HUF_compress1X_usingCTable(void* dst, size_t dstSize, const void* s= rc, size_t srcSize, const HUF_CElt* CTable, int flags); /* HUF_compress1X_repeat() : * Same as HUF_compress1X_wksp(), but considers using hufTable if *repeat= !=3D HUF_repeat_none. * If it uses hufTable it does not modify hufTable or repeat. @@ -312,47 +252,27 @@ size_t HUF_compress1X_repeat(void* dst, size_t dstSiz= e, const void* src, size_t srcSize, unsigned maxSymbolValue, unsigned tableLog, void* workSpace, size_t wkspSize, /*< `workSpace`= must be aligned on 4-bytes boundaries, `wkspSize` must be >=3D HUF_WORKSPA= CE_SIZE */ - HUF_CElt* hufTable, HUF_repeat* repeat, int preferR= epeat, int bmi2, unsigned suspectUncompressible); - -size_t HUF_decompress1X1 (void* dst, size_t dstSize, const void* cSrc, siz= e_t cSrcSize); /* single-symbol decoder */ -#ifndef HUF_FORCE_DECOMPRESS_X1 -size_t HUF_decompress1X2 (void* dst, size_t dstSize, const void* cSrc, siz= e_t cSrcSize); /* double-symbol decoder */ -#endif - -size_t HUF_decompress1X_DCtx (HUF_DTable* dctx, void* dst, size_t dstSize,= const void* cSrc, size_t cSrcSize); -size_t HUF_decompress1X_DCtx_wksp (HUF_DTable* dctx, void* dst, size_t dst= Size, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize); -#ifndef HUF_FORCE_DECOMPRESS_X2 -size_t HUF_decompress1X1_DCtx(HUF_DTable* dctx, void* dst, size_t dstSize,= const void* cSrc, size_t cSrcSize); /*< single-symbol decoder */ -size_t HUF_decompress1X1_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dst= Size, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize);= /*< single-symbol decoder */ -#endif -#ifndef HUF_FORCE_DECOMPRESS_X1 -size_t HUF_decompress1X2_DCtx(HUF_DTable* dctx, void* dst, size_t dstSize,= const void* cSrc, size_t cSrcSize); /*< double-symbols decoder */ -size_t HUF_decompress1X2_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dst= Size, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize);= /*< double-symbols decoder */ -#endif + HUF_CElt* hufTable, HUF_repeat* repeat, int flags); =20 -size_t HUF_decompress1X_usingDTable(void* dst, size_t maxDstSize, const vo= id* cSrc, size_t cSrcSize, const HUF_DTable* DTable); /*< automatic selec= tion of sing or double symbol decoder, based on DTable */ -#ifndef HUF_FORCE_DECOMPRESS_X2 -size_t HUF_decompress1X1_usingDTable(void* dst, size_t maxDstSize, const v= oid* cSrc, size_t cSrcSize, const HUF_DTable* DTable); -#endif +size_t HUF_decompress1X_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dstS= ize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize, i= nt flags); #ifndef HUF_FORCE_DECOMPRESS_X1 -size_t HUF_decompress1X2_usingDTable(void* dst, size_t maxDstSize, const v= oid* cSrc, size_t cSrcSize, const HUF_DTable* DTable); +size_t HUF_decompress1X2_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dst= Size, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize, = int flags); /*< double-symbols decoder */ #endif =20 /* BMI2 variants. * If the CPU has BMI2 support, pass bmi2=3D1, otherwise pass bmi2=3D0. */ -size_t HUF_decompress1X_usingDTable_bmi2(void* dst, size_t maxDstSize, con= st void* cSrc, size_t cSrcSize, const HUF_DTable* DTable, int bmi2); +size_t HUF_decompress1X_usingDTable(void* dst, size_t maxDstSize, const vo= id* cSrc, size_t cSrcSize, const HUF_DTable* DTable, int flags); #ifndef HUF_FORCE_DECOMPRESS_X2 -size_t HUF_decompress1X1_DCtx_wksp_bmi2(HUF_DTable* dctx, void* dst, size_= t dstSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspS= ize, int bmi2); +size_t HUF_decompress1X1_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dst= Size, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize, = int flags); #endif -size_t HUF_decompress4X_usingDTable_bmi2(void* dst, size_t maxDstSize, con= st void* cSrc, size_t cSrcSize, const HUF_DTable* DTable, int bmi2); -size_t HUF_decompress4X_hufOnly_wksp_bmi2(HUF_DTable* dctx, void* dst, siz= e_t dstSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wks= pSize, int bmi2); +size_t HUF_decompress4X_usingDTable(void* dst, size_t maxDstSize, const vo= id* cSrc, size_t cSrcSize, const HUF_DTable* DTable, int flags); +size_t HUF_decompress4X_hufOnly_wksp(HUF_DTable* dctx, void* dst, size_t d= stSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize= , int flags); #ifndef HUF_FORCE_DECOMPRESS_X2 -size_t HUF_readDTableX1_wksp_bmi2(HUF_DTable* DTable, const void* src, siz= e_t srcSize, void* workSpace, size_t wkspSize, int bmi2); +size_t HUF_readDTableX1_wksp(HUF_DTable* DTable, const void* src, size_t s= rcSize, void* workSpace, size_t wkspSize, int flags); #endif #ifndef HUF_FORCE_DECOMPRESS_X1 -size_t HUF_readDTableX2_wksp_bmi2(HUF_DTable* DTable, const void* src, siz= e_t srcSize, void* workSpace, size_t wkspSize, int bmi2); +size_t HUF_readDTableX2_wksp(HUF_DTable* DTable, const void* src, size_t s= rcSize, void* workSpace, size_t wkspSize, int flags); #endif =20 -#endif /* HUF_STATIC_LINKING_ONLY */ - +#endif /* HUF_H_298734234 */ diff --git a/lib/zstd/common/mem.h b/lib/zstd/common/mem.h index c22a2e69bf46..d9bd752fe17b 100644 --- a/lib/zstd/common/mem.h +++ b/lib/zstd/common/mem.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */ /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -24,6 +24,7 @@ /*-**************************************** * Compiler specifics ******************************************/ +#undef MEM_STATIC /* may be already defined from common/compiler.h */ #define MEM_STATIC static inline =20 /*-************************************************************** diff --git a/lib/zstd/common/portability_macros.h b/lib/zstd/common/portabi= lity_macros.h index 0e3b2c0a527d..05286af72683 100644 --- a/lib/zstd/common/portability_macros.h +++ b/lib/zstd/common/portability_macros.h @@ -1,5 +1,6 @@ +/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */ /* - * Copyright (c) Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -12,7 +13,7 @@ #define ZSTD_PORTABILITY_MACROS_H =20 /* - * This header file contains macro defintions to support portability. + * This header file contains macro definitions to support portability. * This header is shared between C and ASM code, so it MUST only * contain macro definitions. It MUST not contain any C code. * @@ -45,30 +46,35 @@ /* Mark the internal assembly functions as hidden */ #ifdef __ELF__ # define ZSTD_HIDE_ASM_FUNCTION(func) .hidden func +#elif defined(__APPLE__) +# define ZSTD_HIDE_ASM_FUNCTION(func) .private_extern func #else # define ZSTD_HIDE_ASM_FUNCTION(func) #endif =20 +/* Compile time determination of BMI2 support */ + + /* Enable runtime BMI2 dispatch based on the CPU. * Enabled for clang & gcc >=3D4.8 on x86 when BMI2 isn't enabled by defau= lt. */ #ifndef DYNAMIC_BMI2 - #if ((defined(__clang__) && __has_attribute(__target__)) \ +# if ((defined(__clang__) && __has_attribute(__target__)) \ || (defined(__GNUC__) \ && (__GNUC__ >=3D 5 || (__GNUC__ =3D=3D 4 && __GNUC_MINOR__ >=3D= 8)))) \ - && (defined(__x86_64__) || defined(_M_X64)) \ + && (defined(__i386__) || defined(__x86_64__) || defined(_M_IX86) || = defined(_M_X64)) \ && !defined(__BMI2__) - # define DYNAMIC_BMI2 1 - #else - # define DYNAMIC_BMI2 0 - #endif +# define DYNAMIC_BMI2 1 +# else +# define DYNAMIC_BMI2 0 +# endif #endif =20 /* - * Only enable assembly for GNUC comptabile compilers, + * Only enable assembly for GNU C compatible compilers, * because other platforms may not support GAS assembly syntax. * - * Only enable assembly for Linux / MacOS, other platforms may + * Only enable assembly for Linux / MacOS / Win32, other platforms may * work, but they haven't been tested. This could likely be * extended to BSD systems. * @@ -90,4 +96,23 @@ */ #define ZSTD_ENABLE_ASM_X86_64_BMI2 0 =20 +/* + * For x86 ELF targets, add .note.gnu.property section for Intel CET in + * assembly sources when CET is enabled. + * + * Additionally, any function that may be called indirectly must begin + * with ZSTD_CET_ENDBRANCH. + */ +#if defined(__ELF__) && (defined(__x86_64__) || defined(__i386__)) \ + && defined(__has_include) +# if __has_include() +# include +# define ZSTD_CET_ENDBRANCH _CET_ENDBR +# endif +#endif + +#ifndef ZSTD_CET_ENDBRANCH +# define ZSTD_CET_ENDBRANCH +#endif + #endif /* ZSTD_PORTABILITY_MACROS_H */ diff --git a/lib/zstd/common/zstd_common.c b/lib/zstd/common/zstd_common.c index 3d7e35b309b5..44b95b25344a 100644 --- a/lib/zstd/common/zstd_common.c +++ b/lib/zstd/common/zstd_common.c @@ -1,5 +1,6 @@ +// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -14,7 +15,6 @@ * Dependencies ***************************************/ #define ZSTD_DEPS_NEED_MALLOC -#include "zstd_deps.h" /* ZSTD_malloc, ZSTD_calloc, ZSTD_free, ZSTD_mems= et */ #include "error_private.h" #include "zstd_internal.h" =20 @@ -47,37 +47,3 @@ ZSTD_ErrorCode ZSTD_getErrorCode(size_t code) { return E= RR_getErrorCode(code); } /*! ZSTD_getErrorString() : * provides error code string from enum */ const char* ZSTD_getErrorString(ZSTD_ErrorCode code) { return ERR_getError= String(code); } - - - -/*=3D************************************************************** -* Custom allocator -****************************************************************/ -void* ZSTD_customMalloc(size_t size, ZSTD_customMem customMem) -{ - if (customMem.customAlloc) - return customMem.customAlloc(customMem.opaque, size); - return ZSTD_malloc(size); -} - -void* ZSTD_customCalloc(size_t size, ZSTD_customMem customMem) -{ - if (customMem.customAlloc) { - /* calloc implemented as malloc+memset; - * not as efficient as calloc, but next best guess for custom mall= oc */ - void* const ptr =3D customMem.customAlloc(customMem.opaque, size); - ZSTD_memset(ptr, 0, size); - return ptr; - } - return ZSTD_calloc(1, size); -} - -void ZSTD_customFree(void* ptr, ZSTD_customMem customMem) -{ - if (ptr!=3DNULL) { - if (customMem.customFree) - customMem.customFree(customMem.opaque, ptr); - else - ZSTD_free(ptr); - } -} diff --git a/lib/zstd/common/zstd_deps.h b/lib/zstd/common/zstd_deps.h index 2c34e8a33a1c..f931f7d0e294 100644 --- a/lib/zstd/common/zstd_deps.h +++ b/lib/zstd/common/zstd_deps.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */ /* - * Copyright (c) Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -105,3 +105,17 @@ static uint64_t ZSTD_div64(uint64_t dividend, uint32_t= divisor) { =20 #endif /* ZSTD_DEPS_IO */ #endif /* ZSTD_DEPS_NEED_IO */ + +/* + * Only requested when MSAN is enabled. + * Need: + * intptr_t + */ +#ifdef ZSTD_DEPS_NEED_STDINT +#ifndef ZSTD_DEPS_STDINT +#define ZSTD_DEPS_STDINT + +/* intptr_t already provided by ZSTD_DEPS_COMMON */ + +#endif /* ZSTD_DEPS_STDINT */ +#endif /* ZSTD_DEPS_NEED_STDINT */ diff --git a/lib/zstd/common/zstd_internal.h b/lib/zstd/common/zstd_interna= l.h index 93305d9b41bb..52a79435caf6 100644 --- a/lib/zstd/common/zstd_internal.h +++ b/lib/zstd/common/zstd_internal.h @@ -1,5 +1,6 @@ +/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */ /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -28,12 +29,10 @@ #include #define FSE_STATIC_LINKING_ONLY #include "fse.h" -#define HUF_STATIC_LINKING_ONLY #include "huf.h" #include /* XXH_reset, update, digest */ #define ZSTD_TRACE 0 =20 - /* ---- static assert (debug) --- */ #define ZSTD_STATIC_ASSERT(c) DEBUG_STATIC_ASSERT(c) #define ZSTD_isError ERR_isError /* for inlining */ @@ -83,16 +82,17 @@ typedef enum { bt_raw, bt_rle, bt_compressed, bt_reserv= ed } blockType_e; #define ZSTD_FRAMECHECKSUMSIZE 4 =20 #define MIN_SEQUENCES_SIZE 1 /* nbSeq=3D=3D0 */ -#define MIN_CBLOCK_SIZE (1 /*litCSize*/ + 1 /* RLE or RAW */ + MIN_SEQUENC= ES_SIZE /* nbSeq=3D=3D0 */) /* for a non-null block */ +#define MIN_CBLOCK_SIZE (1 /*litCSize*/ + 1 /* RLE or RAW */) /* for a n= on-null block */ +#define MIN_LITERALS_FOR_4_STREAMS 6 =20 -#define HufLog 12 -typedef enum { set_basic, set_rle, set_compressed, set_repeat } symbolEnco= dingType_e; +typedef enum { set_basic, set_rle, set_compressed, set_repeat } SymbolEnco= dingType_e; =20 #define LONGNBSEQ 0x7F00 =20 #define MINMATCH 3 =20 #define Litbits 8 +#define LitHufLog 11 #define MaxLit ((1<=3D WILDCOPY_VECLEN || diff <=3D -WILDCOPY_VECLEN); @@ -225,12 +227,6 @@ void ZSTD_wildcopy(void* dst, const void* src, ptrdiff= _t length, ZSTD_overlap_e * one COPY16() in the first call. Then, do two calls per loop sin= ce * at that point it is more likely to have a high trip count. */ -#ifdef __aarch64__ - do { - COPY16(op, ip); - } - while (op < oend); -#else ZSTD_copy16(op, ip); if (16 >=3D length) return; op +=3D 16; @@ -240,7 +236,6 @@ void ZSTD_wildcopy(void* dst, const void* src, ptrdiff_= t length, ZSTD_overlap_e COPY16(op, ip); } while (op < oend); -#endif } } =20 @@ -273,62 +268,6 @@ typedef enum { /*-******************************************* * Private declarations *********************************************/ -typedef struct seqDef_s { - U32 offBase; /* offBase =3D=3D Offset + ZSTD_REP_NUM, or repcode 1,2= ,3 */ - U16 litLength; - U16 mlBase; /* mlBase =3D=3D matchLength - MINMATCH */ -} seqDef; - -/* Controls whether seqStore has a single "long" litLength or matchLength.= See seqStore_t. */ -typedef enum { - ZSTD_llt_none =3D 0, /* no longLengthType */ - ZSTD_llt_literalLength =3D 1, /* represents a long literal */ - ZSTD_llt_matchLength =3D 2 /* represents a long match */ -} ZSTD_longLengthType_e; - -typedef struct { - seqDef* sequencesStart; - seqDef* sequences; /* ptr to end of sequences */ - BYTE* litStart; - BYTE* lit; /* ptr to end of literals */ - BYTE* llCode; - BYTE* mlCode; - BYTE* ofCode; - size_t maxNbSeq; - size_t maxNbLit; - - /* longLengthPos and longLengthType to allow us to represent either a = single litLength or matchLength - * in the seqStore that has a value larger than U16 (if it exists). To= do so, we increment - * the existing value of the litLength or matchLength by 0x10000. - */ - ZSTD_longLengthType_e longLengthType; - U32 longLengthPos; /* Index of the sequence to ap= ply long length modification to */ -} seqStore_t; - -typedef struct { - U32 litLength; - U32 matchLength; -} ZSTD_sequenceLength; - -/* - * Returns the ZSTD_sequenceLength for the given sequences. It handles the= decoding of long sequences - * indicated by longLengthPos and longLengthType, and adds MINMATCH back t= o matchLength. - */ -MEM_STATIC ZSTD_sequenceLength ZSTD_getSequenceLength(seqStore_t const* se= qStore, seqDef const* seq) -{ - ZSTD_sequenceLength seqLen; - seqLen.litLength =3D seq->litLength; - seqLen.matchLength =3D seq->mlBase + MINMATCH; - if (seqStore->longLengthPos =3D=3D (U32)(seq - seqStore->sequencesStar= t)) { - if (seqStore->longLengthType =3D=3D ZSTD_llt_literalLength) { - seqLen.litLength +=3D 0xFFFF; - } - if (seqStore->longLengthType =3D=3D ZSTD_llt_matchLength) { - seqLen.matchLength +=3D 0xFFFF; - } - } - return seqLen; -} =20 /* * Contains the compressed frame size and an upper-bound for the decompres= sed frame size. @@ -337,74 +276,11 @@ MEM_STATIC ZSTD_sequenceLength ZSTD_getSequenceLength= (seqStore_t const* seqStore * `decompressedBound !=3D ZSTD_CONTENTSIZE_ERROR` */ typedef struct { + size_t nbBlocks; size_t compressedSize; unsigned long long decompressedBound; } ZSTD_frameSizeInfo; /* decompress & legacy */ =20 -const seqStore_t* ZSTD_getSeqStore(const ZSTD_CCtx* ctx); /* compress & = dictBuilder */ -void ZSTD_seqToCodes(const seqStore_t* seqStorePtr); /* compress, dictBu= ilder, decodeCorpus (shouldn't get its definition from here) */ - -/* custom memory allocation functions */ -void* ZSTD_customMalloc(size_t size, ZSTD_customMem customMem); -void* ZSTD_customCalloc(size_t size, ZSTD_customMem customMem); -void ZSTD_customFree(void* ptr, ZSTD_customMem customMem); - - -MEM_STATIC U32 ZSTD_highbit32(U32 val) /* compress, dictBuilder, decodeC= orpus */ -{ - assert(val !=3D 0); - { -# if (__GNUC__ >=3D 3) /* GCC Intrinsic */ - return __builtin_clz (val) ^ 31; -# else /* Software version */ - static const U32 DeBruijnClz[32] =3D { 0, 9, 1, 10, 13, 21, 2, 29,= 11, 14, 16, 18, 22, 25, 3, 30, 8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6= , 26, 5, 4, 31 }; - U32 v =3D val; - v |=3D v >> 1; - v |=3D v >> 2; - v |=3D v >> 4; - v |=3D v >> 8; - v |=3D v >> 16; - return DeBruijnClz[(v * 0x07C4ACDDU) >> 27]; -# endif - } -} - -/* - * Counts the number of trailing zeros of a `size_t`. - * Most compilers should support CTZ as a builtin. A backup - * implementation is provided if the builtin isn't supported, but - * it may not be terribly efficient. - */ -MEM_STATIC unsigned ZSTD_countTrailingZeros(size_t val) -{ - if (MEM_64bits()) { -# if (__GNUC__ >=3D 4) - return __builtin_ctzll((U64)val); -# else - static const int DeBruijnBytePos[64] =3D { 0, 1, 2, 7, 3,= 13, 8, 19, - 4, 25, 14, 28, 9, 3= 4, 20, 56, - 5, 17, 26, 54, 15, 4= 1, 29, 43, - 10, 31, 38, 35, 21, = 45, 49, 57, - 63, 6, 12, 18, 24, = 27, 33, 55, - 16, 53, 40, 42, 30, = 37, 44, 48, - 62, 11, 23, 32, 52, = 39, 36, 47, - 61, 22, 51, 46, 60, = 50, 59, 58 }; - return DeBruijnBytePos[((U64)((val & -(long long)val) * 0x0218= A392CDABBD3FULL)) >> 58]; -# endif - } else { /* 32 bits */ -# if (__GNUC__ >=3D 3) - return __builtin_ctz((U32)val); -# else - static const int DeBruijnBytePos[32] =3D { 0, 1, 28, 2, 29,= 14, 24, 3, - 30, 22, 20, 15, 25, 1= 7, 4, 8, - 31, 27, 13, 23, 21, 1= 9, 16, 7, - 26, 12, 18, 6, 11, = 5, 10, 9 }; - return DeBruijnBytePos[((U32)((val & -(S32)val) * 0x077CB531U)= ) >> 27]; -# endif - } -} - - /* ZSTD_invalidateRepCodes() : * ensures next compression will not use repcodes from previous block. * Note : only works with regular variant; @@ -420,13 +296,13 @@ typedef struct { =20 /*! ZSTD_getcBlockSize() : * Provides the size of compressed block from block header `src` */ -/* Used by: decompress, fullbench (does not get its definition from here) = */ +/* Used by: decompress, fullbench */ size_t ZSTD_getcBlockSize(const void* src, size_t srcSize, blockProperties_t* bpPtr); =20 /*! ZSTD_decodeSeqHeaders() : * decode sequence header from src */ -/* Used by: decompress, fullbench (does not get its definition from here) = */ +/* Used by: zstd_decompress_block, fullbench */ size_t ZSTD_decodeSeqHeaders(ZSTD_DCtx* dctx, int* nbSeqPtr, const void* src, size_t srcSize); =20 @@ -439,5 +315,4 @@ MEM_STATIC int ZSTD_cpuSupportsBmi2(void) return ZSTD_cpuid_bmi1(cpuid) && ZSTD_cpuid_bmi2(cpuid); } =20 - #endif /* ZSTD_CCOMMON_H_MODULE */ diff --git a/lib/zstd/compress/clevels.h b/lib/zstd/compress/clevels.h index d9a76112ec3a..6ab8be6532ef 100644 --- a/lib/zstd/compress/clevels.h +++ b/lib/zstd/compress/clevels.h @@ -1,5 +1,6 @@ +/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */ /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the diff --git a/lib/zstd/compress/fse_compress.c b/lib/zstd/compress/fse_compr= ess.c index ec5b1ca6d71a..44a3c10becf2 100644 --- a/lib/zstd/compress/fse_compress.c +++ b/lib/zstd/compress/fse_compress.c @@ -1,6 +1,7 @@ +// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause /* ****************************************************************** * FSE : Finite State Entropy encoder - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * * You can contact the author at : * - FSE source repository : https://github.com/Cyan4973/FiniteStateEntro= py @@ -25,7 +26,8 @@ #include "../common/error_private.h" #define ZSTD_DEPS_NEED_MALLOC #define ZSTD_DEPS_NEED_MATH64 -#include "../common/zstd_deps.h" /* ZSTD_malloc, ZSTD_free, ZSTD_memcpy, = ZSTD_memset */ +#include "../common/zstd_deps.h" /* ZSTD_memset */ +#include "../common/bits.h" /* ZSTD_highbit32 */ =20 =20 /* ************************************************************** @@ -90,7 +92,7 @@ size_t FSE_buildCTable_wksp(FSE_CTable* ct, assert(tableLog < 16); /* required for threshold strategy to work */ =20 /* For explanations on how to distribute symbol values over the table : - * http://fastcompression.blogspot.fr/2014/02/fse-distributing-symbol-= values.html */ + * https://fastcompression.blogspot.fr/2014/02/fse-distributing-symbol= -values.html */ =20 #ifdef __clang_analyzer__ ZSTD_memset(tableSymbol, 0, sizeof(*tableSymbol) * tableSize); /* u= seless initialization, just to keep scan-build happy */ @@ -191,7 +193,7 @@ size_t FSE_buildCTable_wksp(FSE_CTable* ct, break; default : assert(normalizedCounter[s] > 1); - { U32 const maxBitsOut =3D tableLog - BIT_highbit32 ((U3= 2)normalizedCounter[s]-1); + { U32 const maxBitsOut =3D tableLog - ZSTD_highbit32 ((U= 32)normalizedCounter[s]-1); U32 const minStatePlus =3D (U32)normalizedCounter[s] <= < maxBitsOut; symbolTT[s].deltaNbBits =3D (maxBitsOut << 16) - minSt= atePlus; symbolTT[s].deltaFindState =3D (int)(total - (unsigned= )normalizedCounter[s]); @@ -224,8 +226,8 @@ size_t FSE_NCountWriteBound(unsigned maxSymbolValue, un= signed tableLog) size_t const maxHeaderSize =3D (((maxSymbolValue+1) * tableLog + 4 /* bitCount initialized at 4 */ + 2 /* first two symbols may use one ad= ditional bit each */) / 8) - + 1 /* round up to whole nb bytes */ - + 2 /* additional two bytes for bitstr= eam flush */; + + 1 /* round up to whole nb bytes */ + + 2 /* additional two bytes for bitstre= am flush */; return maxSymbolValue ? maxHeaderSize : FSE_NCOUNTBOUND; /* maxSymbol= Value=3D=3D0 ? use default */ } =20 @@ -254,7 +256,7 @@ FSE_writeNCount_generic (void* header, size_t headerBuf= ferSize, /* Init */ remaining =3D tableSize+1; /* +1 for extra accuracy */ threshold =3D tableSize; - nbBits =3D tableLog+1; + nbBits =3D (int)tableLog+1; =20 while ((symbol < alphabetSize) && (remaining>1)) { /* stops at 1 */ if (previousIs0) { @@ -273,7 +275,7 @@ FSE_writeNCount_generic (void* header, size_t headerBuf= ferSize, } while (symbol >=3D start+3) { start+=3D3; - bitStream +=3D 3 << bitCount; + bitStream +=3D 3U << bitCount; bitCount +=3D 2; } bitStream +=3D (symbol-start) << bitCount; @@ -293,7 +295,7 @@ FSE_writeNCount_generic (void* header, size_t headerBuf= ferSize, count++; /* +1 for extra accuracy */ if (count>=3Dthreshold) count +=3D max; /* [0..max[ [max..threshold[ (...) [thre= shold+max 2*threshold[ */ - bitStream +=3D count << bitCount; + bitStream +=3D (U32)count << bitCount; bitCount +=3D nbBits; bitCount -=3D (count>8); out+=3D (bitCount+7) /8; =20 - return (out-ostart); + assert(out >=3D ostart); + return (size_t)(out-ostart); } =20 =20 @@ -342,21 +345,11 @@ size_t FSE_writeNCount (void* buffer, size_t bufferSi= ze, * FSE Compression Code ****************************************************************/ =20 -FSE_CTable* FSE_createCTable (unsigned maxSymbolValue, unsigned tableLog) -{ - size_t size; - if (tableLog > FSE_TABLELOG_ABSOLUTE_MAX) tableLog =3D FSE_TABLELOG_AB= SOLUTE_MAX; - size =3D FSE_CTABLE_SIZE_U32 (tableLog, maxSymbolValue) * sizeof(U32); - return (FSE_CTable*)ZSTD_malloc(size); -} - -void FSE_freeCTable (FSE_CTable* ct) { ZSTD_free(ct); } - /* provides the minimum logSize to safely represent a distribution */ static unsigned FSE_minTableLog(size_t srcSize, unsigned maxSymbolValue) { - U32 minBitsSrc =3D BIT_highbit32((U32)(srcSize)) + 1; - U32 minBitsSymbols =3D BIT_highbit32(maxSymbolValue) + 2; + U32 minBitsSrc =3D ZSTD_highbit32((U32)(srcSize)) + 1; + U32 minBitsSymbols =3D ZSTD_highbit32(maxSymbolValue) + 2; U32 minBits =3D minBitsSrc < minBitsSymbols ? minBitsSrc : minBitsSymb= ols; assert(srcSize > 1); /* Not supported, RLE should be used instead */ return minBits; @@ -364,7 +357,7 @@ static unsigned FSE_minTableLog(size_t srcSize, unsigne= d maxSymbolValue) =20 unsigned FSE_optimalTableLog_internal(unsigned maxTableLog, size_t srcSize= , unsigned maxSymbolValue, unsigned minus) { - U32 maxBitsSrc =3D BIT_highbit32((U32)(srcSize - 1)) - minus; + U32 maxBitsSrc =3D ZSTD_highbit32((U32)(srcSize - 1)) - minus; U32 tableLog =3D maxTableLog; U32 minBits =3D FSE_minTableLog(srcSize, maxSymbolValue); assert(srcSize > 1); /* Not supported, RLE should be used instead */ @@ -532,40 +525,6 @@ size_t FSE_normalizeCount (short* normalizedCounter, u= nsigned tableLog, return tableLog; } =20 - -/* fake FSE_CTable, for raw (uncompressed) input */ -size_t FSE_buildCTable_raw (FSE_CTable* ct, unsigned nbBits) -{ - const unsigned tableSize =3D 1 << nbBits; - const unsigned tableMask =3D tableSize - 1; - const unsigned maxSymbolValue =3D tableMask; - void* const ptr =3D ct; - U16* const tableU16 =3D ( (U16*) ptr) + 2; - void* const FSCT =3D ((U32*)ptr) + 1 /* header */ + (tableSize>>1); = /* assumption : tableLog >=3D 1 */ - FSE_symbolCompressionTransform* const symbolTT =3D (FSE_symbolCompress= ionTransform*) (FSCT); - unsigned s; - - /* Sanity checks */ - if (nbBits < 1) return ERROR(GENERIC); /* min size */ - - /* header */ - tableU16[-2] =3D (U16) nbBits; - tableU16[-1] =3D (U16) maxSymbolValue; - - /* Build table */ - for (s=3D0; s=3D 2 + +static size_t showU32(const U32* arr, size_t size) { - return FSE_optimalTableLog_internal(maxTableLog, srcSize, maxSymbolVal= ue, 1); + size_t u; + for (u=3D0; u=3D sizeof(HUF_WriteCTabl= eWksp)); + + assert(HUF_readCTableHeader(CTable).maxSymbolValue =3D=3D maxSymbolVal= ue); + assert(HUF_readCTableHeader(CTable).tableLog =3D=3D huffLog); + /* check conditions */ if (workspaceSize < sizeof(HUF_WriteCTableWksp)) return ERROR(GENERIC); if (maxSymbolValue > HUF_SYMBOLVALUE_MAX) return ERROR(maxSymbolValue_= tooLarge); @@ -204,16 +286,6 @@ size_t HUF_writeCTable_wksp(void* dst, size_t maxDstSi= ze, return ((maxSymbolValue+1)/2) + 1; } =20 -/*! HUF_writeCTable() : - `CTable` : Huffman tree to save, using huf representation. - @return : size of saved CTable */ -size_t HUF_writeCTable (void* dst, size_t maxDstSize, - const HUF_CElt* CTable, unsigned maxSymbolValue, u= nsigned huffLog) -{ - HUF_WriteCTableWksp wksp; - return HUF_writeCTable_wksp(dst, maxDstSize, CTable, maxSymbolValue, h= uffLog, &wksp, sizeof(wksp)); -} - =20 size_t HUF_readCTable (HUF_CElt* CTable, unsigned* maxSymbolValuePtr, cons= t void* src, size_t srcSize, unsigned* hasZeroWeights) { @@ -231,7 +303,9 @@ size_t HUF_readCTable (HUF_CElt* CTable, unsigned* maxS= ymbolValuePtr, const void if (tableLog > HUF_TABLELOG_MAX) return ERROR(tableLog_tooLarge); if (nbSymbols > *maxSymbolValuePtr+1) return ERROR(maxSymbolValue_tooS= mall); =20 - CTable[0] =3D tableLog; + *maxSymbolValuePtr =3D nbSymbols - 1; + + HUF_writeCTableHeader(CTable, tableLog, *maxSymbolValuePtr); =20 /* Prepare base value per rank */ { U32 n, nextRankStart =3D 0; @@ -263,74 +337,71 @@ size_t HUF_readCTable (HUF_CElt* CTable, unsigned* ma= xSymbolValuePtr, const void { U32 n; for (n=3D0; n HUF_readCTableHeader(CTable).maxSymbolValue) + return 0; return (U32)HUF_getNbBits(ct[symbolValue]); } =20 =20 -typedef struct nodeElt_s { - U32 count; - U16 parent; - BYTE byte; - BYTE nbBits; -} nodeElt; - /* * HUF_setMaxHeight(): - * Enforces maxNbBits on the Huffman tree described in huffNode. + * Try to enforce @targetNbBits on the Huffman tree described in @huffNode. * - * It sets all nodes with nbBits > maxNbBits to be maxNbBits. Then it adju= sts - * the tree to so that it is a valid canonical Huffman tree. + * It attempts to convert all nodes with nbBits > @targetNbBits + * to employ @targetNbBits instead. Then it adjusts the tree + * so that it remains a valid canonical Huffman tree. * * @pre The sum of the ranks of each symbol =3D=3D 2^largest= Bits, * where largestBits =3D=3D huffNode[lastNonNull].nbBit= s. * @post The sum of the ranks of each symbol =3D=3D 2^largest= Bits, - * where largestBits is the return value <=3D maxNbBits. + * where largestBits is the return value (expected <=3D= targetNbBits). * - * @param huffNode The Huffman tree modified in place to enforce maxNbB= its. + * @param huffNode The Huffman tree modified in place to enforce target= NbBits. + * It's presumed sorted, from most frequent to rarest s= ymbol. * @param lastNonNull The symbol with the lowest count in the Huffman tree. - * @param maxNbBits The maximum allowed number of bits, which the Huffma= n tree + * @param targetNbBits The allowed number of bits, which the Huffman tree * may not respect. After this function the Huffman tre= e will - * respect maxNbBits. - * @return The maximum number of bits of the Huffman tree after= adjustment, - * necessarily no more than maxNbBits. + * respect targetNbBits. + * @return The maximum number of bits of the Huffman tree after= adjustment. */ -static U32 HUF_setMaxHeight(nodeElt* huffNode, U32 lastNonNull, U32 maxNbB= its) +static U32 HUF_setMaxHeight(nodeElt* huffNode, U32 lastNonNull, U32 target= NbBits) { const U32 largestBits =3D huffNode[lastNonNull].nbBits; - /* early exit : no elt > maxNbBits, so the tree is already valid. */ - if (largestBits <=3D maxNbBits) return largestBits; + /* early exit : no elt > targetNbBits, so the tree is already valid. */ + if (largestBits <=3D targetNbBits) return largestBits; + + DEBUGLOG(5, "HUF_setMaxHeight (targetNbBits =3D %u)", targetNbBits); =20 /* there are several too large elements (at least >=3D 2) */ { int totalCost =3D 0; - const U32 baseCost =3D 1 << (largestBits - maxNbBits); + const U32 baseCost =3D 1 << (largestBits - targetNbBits); int n =3D (int)lastNonNull; =20 - /* Adjust any ranks > maxNbBits to maxNbBits. + /* Adjust any ranks > targetNbBits to targetNbBits. * Compute totalCost, which is how far the sum of the ranks is * we are over 2^largestBits after adjust the offending ranks. */ - while (huffNode[n].nbBits > maxNbBits) { + while (huffNode[n].nbBits > targetNbBits) { totalCost +=3D baseCost - (1 << (largestBits - huffNode[n].nbB= its)); - huffNode[n].nbBits =3D (BYTE)maxNbBits; + huffNode[n].nbBits =3D (BYTE)targetNbBits; n--; } - /* n stops at huffNode[n].nbBits <=3D maxNbBits */ - assert(huffNode[n].nbBits <=3D maxNbBits); - /* n end at index of smallest symbol using < maxNbBits */ - while (huffNode[n].nbBits =3D=3D maxNbBits) --n; + /* n stops at huffNode[n].nbBits <=3D targetNbBits */ + assert(huffNode[n].nbBits <=3D targetNbBits); + /* n end at index of smallest symbol using < targetNbBits */ + while (huffNode[n].nbBits =3D=3D targetNbBits) --n; =20 - /* renorm totalCost from 2^largestBits to 2^maxNbBits + /* renorm totalCost from 2^largestBits to 2^targetNbBits * note : totalCost is necessarily a multiple of baseCost */ - assert((totalCost & (baseCost - 1)) =3D=3D 0); - totalCost >>=3D (largestBits - maxNbBits); + assert(((U32)totalCost & (baseCost - 1)) =3D=3D 0); + totalCost >>=3D (largestBits - targetNbBits); assert(totalCost > 0); =20 /* repay normalized cost */ @@ -339,19 +410,19 @@ static U32 HUF_setMaxHeight(nodeElt* huffNode, U32 la= stNonNull, U32 maxNbBits) =20 /* Get pos of last (smallest =3D lowest cum. count) symbol per= rank */ ZSTD_memset(rankLast, 0xF0, sizeof(rankLast)); - { U32 currentNbBits =3D maxNbBits; + { U32 currentNbBits =3D targetNbBits; int pos; for (pos=3Dn ; pos >=3D 0; pos--) { if (huffNode[pos].nbBits >=3D currentNbBits) continue; - currentNbBits =3D huffNode[pos].nbBits; /* < maxNbBi= ts */ - rankLast[maxNbBits-currentNbBits] =3D (U32)pos; + currentNbBits =3D huffNode[pos].nbBits; /* < targetN= bBits */ + rankLast[targetNbBits-currentNbBits] =3D (U32)pos; } } =20 while (totalCost > 0) { /* Try to reduce the next power of 2 above totalCost becau= se we * gain back half the rank. */ - U32 nBitsToDecrease =3D BIT_highbit32((U32)totalCost) + 1; + U32 nBitsToDecrease =3D ZSTD_highbit32((U32)totalCost) + 1; for ( ; nBitsToDecrease > 1; nBitsToDecrease--) { U32 const highPos =3D rankLast[nBitsToDecrease]; U32 const lowPos =3D rankLast[nBitsToDecrease-1]; @@ -391,7 +462,7 @@ static U32 HUF_setMaxHeight(nodeElt* huffNode, U32 last= NonNull, U32 maxNbBits) rankLast[nBitsToDecrease] =3D noSymbol; else { rankLast[nBitsToDecrease]--; - if (huffNode[rankLast[nBitsToDecrease]].nbBits !=3D ma= xNbBits-nBitsToDecrease) + if (huffNode[rankLast[nBitsToDecrease]].nbBits !=3D ta= rgetNbBits-nBitsToDecrease) rankLast[nBitsToDecrease] =3D noSymbol; /* this = rank is now empty */ } } /* while (totalCost > 0) */ @@ -403,11 +474,11 @@ static U32 HUF_setMaxHeight(nodeElt* huffNode, U32 la= stNonNull, U32 maxNbBits) * TODO. */ while (totalCost < 0) { /* Sometimes, cost correction oversho= ot */ - /* special case : no rank 1 symbol (using maxNbBits-1); - * let's create one from largest rank 0 (using maxNbBits). + /* special case : no rank 1 symbol (using targetNbBits-1); + * let's create one from largest rank 0 (using targetNbBit= s). */ if (rankLast[1] =3D=3D noSymbol) { - while (huffNode[n].nbBits =3D=3D maxNbBits) n--; + while (huffNode[n].nbBits =3D=3D targetNbBits) n--; huffNode[n+1].nbBits--; assert(n >=3D 0); rankLast[1] =3D (U32)(n+1); @@ -421,7 +492,7 @@ static U32 HUF_setMaxHeight(nodeElt* huffNode, U32 last= NonNull, U32 maxNbBits) } /* repay normalized cost */ } /* there are several too large elements (at least >=3D 2) */ =20 - return maxNbBits; + return targetNbBits; } =20 typedef struct { @@ -429,7 +500,7 @@ typedef struct { U16 curr; } rankPos; =20 -typedef nodeElt huffNodeTable[HUF_CTABLE_WORKSPACE_SIZE_U32]; +typedef nodeElt huffNodeTable[2 * (HUF_SYMBOLVALUE_MAX + 1)]; =20 /* Number of buckets available for HUF_sort() */ #define RANK_POSITION_TABLE_SIZE 192 @@ -448,8 +519,8 @@ typedef struct { * Let buckets 166 to 192 represent all remaining counts up to RANK_POSITI= ON_MAX_COUNT_LOG using log2 bucketing. */ #define RANK_POSITION_MAX_COUNT_LOG 32 -#define RANK_POSITION_LOG_BUCKETS_BEGIN (RANK_POSITION_TABLE_SIZE - 1) - R= ANK_POSITION_MAX_COUNT_LOG - 1 /* =3D=3D 158 */ -#define RANK_POSITION_DISTINCT_COUNT_CUTOFF RANK_POSITION_LOG_BUCKETS_BEGI= N + BIT_highbit32(RANK_POSITION_LOG_BUCKETS_BEGIN) /* =3D=3D 166 */ +#define RANK_POSITION_LOG_BUCKETS_BEGIN ((RANK_POSITION_TABLE_SIZE - 1) - = RANK_POSITION_MAX_COUNT_LOG - 1 /* =3D=3D 158 */) +#define RANK_POSITION_DISTINCT_COUNT_CUTOFF (RANK_POSITION_LOG_BUCKETS_BEG= IN + ZSTD_highbit32(RANK_POSITION_LOG_BUCKETS_BEGIN) /* =3D=3D 166 */) =20 /* Return the appropriate bucket index for a given count. See definition of * RANK_POSITION_DISTINCT_COUNT_CUTOFF for explanation of bucketing strate= gy. @@ -457,7 +528,7 @@ typedef struct { static U32 HUF_getIndex(U32 const count) { return (count < RANK_POSITION_DISTINCT_COUNT_CUTOFF) ? count - : BIT_highbit32(count) + RANK_POSITION_LOG_BUCKETS_BEGIN; + : ZSTD_highbit32(count) + RANK_POSITION_LOG_BUCKETS_BEGIN; } =20 /* Helper swap function for HUF_quickSortPartition() */ @@ -580,7 +651,7 @@ static void HUF_sort(nodeElt huffNode[], const unsigned= count[], U32 const maxSy =20 /* Sort each bucket. */ for (n =3D RANK_POSITION_DISTINCT_COUNT_CUTOFF; n < RANK_POSITION_TABL= E_SIZE - 1; ++n) { - U32 const bucketSize =3D rankPosition[n].curr-rankPosition[n].base; + int const bucketSize =3D rankPosition[n].curr - rankPosition[n].ba= se; U32 const bucketStartIdx =3D rankPosition[n].base; if (bucketSize > 1) { assert(bucketStartIdx < maxSymbolValue1); @@ -591,6 +662,7 @@ static void HUF_sort(nodeElt huffNode[], const unsigned= count[], U32 const maxSy assert(HUF_isSorted(huffNode, maxSymbolValue1)); } =20 + /* HUF_buildCTable_wksp() : * Same as HUF_buildCTable(), but using externally allocated scratch buff= er. * `workSpace` must be aligned on 4-bytes boundaries, and be at least as = large as sizeof(HUF_buildCTable_wksp_tables). @@ -611,6 +683,7 @@ static int HUF_buildTree(nodeElt* huffNode, U32 maxSymb= olValue) int lowS, lowN; int nodeNb =3D STARTNODE; int n, nodeRoot; + DEBUGLOG(5, "HUF_buildTree (alphabet size =3D %u)", maxSymbolValue + 1= ); /* init for parents */ nonNullRank =3D (int)maxSymbolValue; while(huffNode[nonNullRank].count =3D=3D 0) nonNullRank--; @@ -637,6 +710,8 @@ static int HUF_buildTree(nodeElt* huffNode, U32 maxSymb= olValue) for (n=3D0; n<=3DnonNullRank; n++) huffNode[n].nbBits =3D huffNode[ huffNode[n].parent ].nbBits + 1; =20 + DEBUGLOG(6, "Initial distribution of bits completed (%zu sorted symbol= s)", showHNodeBits(huffNode, maxSymbolValue+1)); + return nonNullRank; } =20 @@ -671,31 +746,40 @@ static void HUF_buildCTableFromTree(HUF_CElt* CTable,= nodeElt const* huffNode, i HUF_setNbBits(ct + huffNode[n].byte, huffNode[n].nbBits); /* pus= h nbBits per symbol, symbol order */ for (n=3D0; nhuffNodeTbl; nodeElt* const huffNode =3D huffNode0+1; int nonNullRank; =20 + HUF_STATIC_ASSERT(HUF_CTABLE_WORKSPACE_SIZE =3D=3D sizeof(HUF_buildCTa= ble_wksp_tables)); + + DEBUGLOG(5, "HUF_buildCTable_wksp (alphabet size =3D %u)", maxSymbolVa= lue+1); + /* safety checks */ if (wkspSize < sizeof(HUF_buildCTable_wksp_tables)) - return ERROR(workSpace_tooSmall); + return ERROR(workSpace_tooSmall); if (maxNbBits =3D=3D 0) maxNbBits =3D HUF_TABLELOG_DEFAULT; if (maxSymbolValue > HUF_SYMBOLVALUE_MAX) - return ERROR(maxSymbolValue_tooLarge); + return ERROR(maxSymbolValue_tooLarge); ZSTD_memset(huffNode0, 0, sizeof(huffNodeTable)); =20 /* sort, decreasing order */ HUF_sort(huffNode, count, maxSymbolValue, wksp_tables->rankPosition); + DEBUGLOG(6, "sorted symbols completed (%zu symbols)", showHNodeSymbols= (huffNode, maxSymbolValue+1)); =20 /* build tree */ nonNullRank =3D HUF_buildTree(huffNode, maxSymbolValue); =20 - /* enforce maxTableLog */ + /* determine and enforce maxTableLog */ maxNbBits =3D HUF_setMaxHeight(huffNode, (U32)nonNullRank, maxNbBits); if (maxNbBits > HUF_TABLELOG_MAX) return ERROR(GENERIC); /* check fi= t into table */ =20 @@ -716,13 +800,20 @@ size_t HUF_estimateCompressedSize(const HUF_CElt* CTa= ble, const unsigned* count, } =20 int HUF_validateCTable(const HUF_CElt* CTable, const unsigned* count, unsi= gned maxSymbolValue) { - HUF_CElt const* ct =3D CTable + 1; - int bad =3D 0; - int s; - for (s =3D 0; s <=3D (int)maxSymbolValue; ++s) { - bad |=3D (count[s] !=3D 0) & (HUF_getNbBits(ct[s]) =3D=3D 0); - } - return !bad; + HUF_CTableHeader header =3D HUF_readCTableHeader(CTable); + HUF_CElt const* ct =3D CTable + 1; + int bad =3D 0; + int s; + + assert(header.tableLog <=3D HUF_TABLELOG_ABSOLUTEMAX); + + if (header.maxSymbolValue < maxSymbolValue) + return 0; + + for (s =3D 0; s <=3D (int)maxSymbolValue; ++s) { + bad |=3D (count[s] !=3D 0) & (HUF_getNbBits(ct[s]) =3D=3D 0); + } + return !bad; } =20 size_t HUF_compressBound(size_t size) { return HUF_COMPRESSBOUND(size); } @@ -804,7 +895,7 @@ FORCE_INLINE_TEMPLATE void HUF_addBits(HUF_CStream_t* b= itC, HUF_CElt elt, int id #if DEBUGLEVEL >=3D 1 { size_t const nbBits =3D HUF_getNbBits(elt); - size_t const dirtyBits =3D nbBits =3D=3D 0 ? 0 : BIT_highbit32((U3= 2)nbBits) + 1; + size_t const dirtyBits =3D nbBits =3D=3D 0 ? 0 : ZSTD_highbit32((U= 32)nbBits) + 1; (void)dirtyBits; /* Middle bits are 0. */ assert(((elt >> dirtyBits) << (dirtyBits + nbBits)) =3D=3D 0); @@ -884,7 +975,7 @@ static size_t HUF_closeCStream(HUF_CStream_t* bitC) { size_t const nbBits =3D bitC->bitPos[0] & 0xFF; if (bitC->ptr >=3D bitC->endPtr) return 0; /* overflow detected */ - return (bitC->ptr - bitC->startPtr) + (nbBits > 0); + return (size_t)(bitC->ptr - bitC->startPtr) + (nbBits > 0); } } =20 @@ -964,17 +1055,17 @@ HUF_compress1X_usingCTable_internal_body(void* dst, = size_t dstSize, const void* src, size_t srcSize, const HUF_CElt* CTable) { - U32 const tableLog =3D (U32)CTable[0]; + U32 const tableLog =3D HUF_readCTableHeader(CTable).tableLog; HUF_CElt const* ct =3D CTable + 1; const BYTE* ip =3D (const BYTE*) src; BYTE* const ostart =3D (BYTE*)dst; BYTE* const oend =3D ostart + dstSize; - BYTE* op =3D ostart; HUF_CStream_t bitC; =20 /* init */ if (dstSize < 8) return 0; /* not enough space to compress */ - { size_t const initErr =3D HUF_initCStream(&bitC, op, (size_t)(oend-op= )); + { BYTE* op =3D ostart; + size_t const initErr =3D HUF_initCStream(&bitC, op, (size_t)(oend-op= )); if (HUF_isError(initErr)) return 0; } =20 if (dstSize < HUF_tightCompressBound(srcSize, (size_t)tableLog) || tab= leLog > 11) @@ -1045,9 +1136,9 @@ HUF_compress1X_usingCTable_internal_default(void* dst= , size_t dstSize, static size_t HUF_compress1X_usingCTable_internal(void* dst, size_t dstSize, const void* src, size_t srcSize, - const HUF_CElt* CTable, const int bmi2) + const HUF_CElt* CTable, const int flags) { - if (bmi2) { + if (flags & HUF_flags_bmi2) { return HUF_compress1X_usingCTable_internal_bmi2(dst, dstSize, src,= srcSize, CTable); } return HUF_compress1X_usingCTable_internal_default(dst, dstSize, src, = srcSize, CTable); @@ -1058,28 +1149,23 @@ HUF_compress1X_usingCTable_internal(void* dst, size= _t dstSize, static size_t HUF_compress1X_usingCTable_internal(void* dst, size_t dstSize, const void* src, size_t srcSize, - const HUF_CElt* CTable, const int bmi2) + const HUF_CElt* CTable, const int flags) { - (void)bmi2; + (void)flags; return HUF_compress1X_usingCTable_internal_body(dst, dstSize, src, src= Size, CTable); } =20 #endif =20 -size_t HUF_compress1X_usingCTable(void* dst, size_t dstSize, const void* s= rc, size_t srcSize, const HUF_CElt* CTable) +size_t HUF_compress1X_usingCTable(void* dst, size_t dstSize, const void* s= rc, size_t srcSize, const HUF_CElt* CTable, int flags) { - return HUF_compress1X_usingCTable_bmi2(dst, dstSize, src, srcSize, CTa= ble, /* bmi2 */ 0); -} - -size_t HUF_compress1X_usingCTable_bmi2(void* dst, size_t dstSize, const vo= id* src, size_t srcSize, const HUF_CElt* CTable, int bmi2) -{ - return HUF_compress1X_usingCTable_internal(dst, dstSize, src, srcSize,= CTable, bmi2); + return HUF_compress1X_usingCTable_internal(dst, dstSize, src, srcSize,= CTable, flags); } =20 static size_t HUF_compress4X_usingCTable_internal(void* dst, size_t dstSize, const void* src, size_t srcSize, - const HUF_CElt* CTable, int bmi2) + const HUF_CElt* CTable, int flags) { size_t const segmentSize =3D (srcSize+3)/4; /* first 3 segments */ const BYTE* ip =3D (const BYTE*) src; @@ -1093,7 +1179,7 @@ HUF_compress4X_usingCTable_internal(void* dst, size_t= dstSize, op +=3D 6; /* jumpTable */ =20 assert(op <=3D oend); - { CHECK_V_F(cSize, HUF_compress1X_usingCTable_internal(op, (size_t)(= oend-op), ip, segmentSize, CTable, bmi2) ); + { CHECK_V_F(cSize, HUF_compress1X_usingCTable_internal(op, (size_t)(= oend-op), ip, segmentSize, CTable, flags) ); if (cSize =3D=3D 0 || cSize > 65535) return 0; MEM_writeLE16(ostart, (U16)cSize); op +=3D cSize; @@ -1101,7 +1187,7 @@ HUF_compress4X_usingCTable_internal(void* dst, size_t= dstSize, =20 ip +=3D segmentSize; assert(op <=3D oend); - { CHECK_V_F(cSize, HUF_compress1X_usingCTable_internal(op, (size_t)(= oend-op), ip, segmentSize, CTable, bmi2) ); + { CHECK_V_F(cSize, HUF_compress1X_usingCTable_internal(op, (size_t)(= oend-op), ip, segmentSize, CTable, flags) ); if (cSize =3D=3D 0 || cSize > 65535) return 0; MEM_writeLE16(ostart+2, (U16)cSize); op +=3D cSize; @@ -1109,7 +1195,7 @@ HUF_compress4X_usingCTable_internal(void* dst, size_t= dstSize, =20 ip +=3D segmentSize; assert(op <=3D oend); - { CHECK_V_F(cSize, HUF_compress1X_usingCTable_internal(op, (size_t)(= oend-op), ip, segmentSize, CTable, bmi2) ); + { CHECK_V_F(cSize, HUF_compress1X_usingCTable_internal(op, (size_t)(= oend-op), ip, segmentSize, CTable, flags) ); if (cSize =3D=3D 0 || cSize > 65535) return 0; MEM_writeLE16(ostart+4, (U16)cSize); op +=3D cSize; @@ -1118,7 +1204,7 @@ HUF_compress4X_usingCTable_internal(void* dst, size_t= dstSize, ip +=3D segmentSize; assert(op <=3D oend); assert(ip <=3D iend); - { CHECK_V_F(cSize, HUF_compress1X_usingCTable_internal(op, (size_t)(= oend-op), ip, (size_t)(iend-ip), CTable, bmi2) ); + { CHECK_V_F(cSize, HUF_compress1X_usingCTable_internal(op, (size_t)(= oend-op), ip, (size_t)(iend-ip), CTable, flags) ); if (cSize =3D=3D 0 || cSize > 65535) return 0; op +=3D cSize; } @@ -1126,14 +1212,9 @@ HUF_compress4X_usingCTable_internal(void* dst, size_= t dstSize, return (size_t)(op-ostart); } =20 -size_t HUF_compress4X_usingCTable(void* dst, size_t dstSize, const void* s= rc, size_t srcSize, const HUF_CElt* CTable) -{ - return HUF_compress4X_usingCTable_bmi2(dst, dstSize, src, srcSize, CTa= ble, /* bmi2 */ 0); -} - -size_t HUF_compress4X_usingCTable_bmi2(void* dst, size_t dstSize, const vo= id* src, size_t srcSize, const HUF_CElt* CTable, int bmi2) +size_t HUF_compress4X_usingCTable(void* dst, size_t dstSize, const void* s= rc, size_t srcSize, const HUF_CElt* CTable, int flags) { - return HUF_compress4X_usingCTable_internal(dst, dstSize, src, srcSize,= CTable, bmi2); + return HUF_compress4X_usingCTable_internal(dst, dstSize, src, srcSize,= CTable, flags); } =20 typedef enum { HUF_singleStream, HUF_fourStreams } HUF_nbStreams_e; @@ -1141,11 +1222,11 @@ typedef enum { HUF_singleStream, HUF_fourStreams } = HUF_nbStreams_e; static size_t HUF_compressCTable_internal( BYTE* const ostart, BYTE* op, BYTE* const oend, const void* src, size_t srcSize, - HUF_nbStreams_e nbStreams, const HUF_CElt* CTable, const i= nt bmi2) + HUF_nbStreams_e nbStreams, const HUF_CElt* CTable, const i= nt flags) { size_t const cSize =3D (nbStreams=3D=3DHUF_singleStream) ? - HUF_compress1X_usingCTable_internal(op, (size_t)(= oend - op), src, srcSize, CTable, bmi2) : - HUF_compress4X_usingCTable_internal(op, (size_t)(= oend - op), src, srcSize, CTable, bmi2); + HUF_compress1X_usingCTable_internal(op, (size_t)(= oend - op), src, srcSize, CTable, flags) : + HUF_compress4X_usingCTable_internal(op, (size_t)(= oend - op), src, srcSize, CTable, flags); if (HUF_isError(cSize)) { return cSize; } if (cSize=3D=3D0) { return 0; } /* uncompressible */ op +=3D cSize; @@ -1168,6 +1249,81 @@ typedef struct { #define SUSPECT_INCOMPRESSIBLE_SAMPLE_SIZE 4096 #define SUSPECT_INCOMPRESSIBLE_SAMPLE_RATIO 10 /* Must be >=3D 2 */ =20 +unsigned HUF_cardinality(const unsigned* count, unsigned maxSymbolValue) +{ + unsigned cardinality =3D 0; + unsigned i; + + for (i =3D 0; i < maxSymbolValue + 1; i++) { + if (count[i] !=3D 0) cardinality +=3D 1; + } + + return cardinality; +} + +unsigned HUF_minTableLog(unsigned symbolCardinality) +{ + U32 minBitsSymbols =3D ZSTD_highbit32(symbolCardinality) + 1; + return minBitsSymbols; +} + +unsigned HUF_optimalTableLog( + unsigned maxTableLog, + size_t srcSize, + unsigned maxSymbolValue, + void* workSpace, size_t wkspSize, + HUF_CElt* table, + const unsigned* count, + int flags) +{ + assert(srcSize > 1); /* Not supported, RLE should be used instead */ + assert(wkspSize >=3D sizeof(HUF_buildCTable_wksp_tables)); + + if (!(flags & HUF_flags_optimalDepth)) { + /* cheap evaluation, based on FSE */ + return FSE_optimalTableLog_internal(maxTableLog, srcSize, maxSymbo= lValue, 1); + } + + { BYTE* dst =3D (BYTE*)workSpace + sizeof(HUF_WriteCTableWksp); + size_t dstSize =3D wkspSize - sizeof(HUF_WriteCTableWksp); + size_t hSize, newSize; + const unsigned symbolCardinality =3D HUF_cardinality(count, maxSym= bolValue); + const unsigned minTableLog =3D HUF_minTableLog(symbolCardinality); + size_t optSize =3D ((size_t) ~0) - 1; + unsigned optLog =3D maxTableLog, optLogGuess; + + DEBUGLOG(6, "HUF_optimalTableLog: probing huf depth (srcSize=3D%zu= )", srcSize); + + /* Search until size increases */ + for (optLogGuess =3D minTableLog; optLogGuess <=3D maxTableLog; op= tLogGuess++) { + DEBUGLOG(7, "checking for huffLog=3D%u", optLogGuess); + + { size_t maxBits =3D HUF_buildCTable_wksp(table, count, maxS= ymbolValue, optLogGuess, workSpace, wkspSize); + if (ERR_isError(maxBits)) continue; + + if (maxBits < optLogGuess && optLogGuess > minTableLog) br= eak; + + hSize =3D HUF_writeCTable_wksp(dst, dstSize, table, maxSym= bolValue, (U32)maxBits, workSpace, wkspSize); + } + + if (ERR_isError(hSize)) continue; + + newSize =3D HUF_estimateCompressedSize(table, count, maxSymbol= Value) + hSize; + + if (newSize > optSize + 1) { + break; + } + + if (newSize < optSize) { + optSize =3D newSize; + optLog =3D optLogGuess; + } + } + assert(optLog <=3D HUF_TABLELOG_MAX); + return optLog; + } +} + /* HUF_compress_internal() : * `workSpace_align4` must be aligned on 4-bytes boundaries, * and occupies the same space as a table of HUF_WORKSPACE_SIZE_U64 unsign= ed */ @@ -1177,14 +1333,14 @@ HUF_compress_internal (void* dst, size_t dstSize, unsigned maxSymbolValue, unsigned huffLog, HUF_nbStreams_e nbStreams, void* workSpace, size_t wkspSize, - HUF_CElt* oldHufTable, HUF_repeat* repeat, int pref= erRepeat, - const int bmi2, unsigned suspectUncompressible) + HUF_CElt* oldHufTable, HUF_repeat* repeat, int flag= s) { HUF_compress_tables_t* const table =3D (HUF_compress_tables_t*)HUF_ali= gnUpWorkspace(workSpace, &wkspSize, ZSTD_ALIGNOF(size_t)); BYTE* const ostart =3D (BYTE*)dst; BYTE* const oend =3D ostart + dstSize; BYTE* op =3D ostart; =20 + DEBUGLOG(5, "HUF_compress_internal (srcSize=3D%zu)", srcSize); HUF_STATIC_ASSERT(sizeof(*table) + HUF_WORKSPACE_MAX_ALIGNMENT <=3D HU= F_WORKSPACE_SIZE); =20 /* checks & inits */ @@ -1198,16 +1354,17 @@ HUF_compress_internal (void* dst, size_t dstSize, if (!huffLog) huffLog =3D HUF_TABLELOG_DEFAULT; =20 /* Heuristic : If old table is valid, use it for small inputs */ - if (preferRepeat && repeat && *repeat =3D=3D HUF_repeat_valid) { + if ((flags & HUF_flags_preferRepeat) && repeat && *repeat =3D=3D HUF_r= epeat_valid) { return HUF_compressCTable_internal(ostart, op, oend, src, srcSize, - nbStreams, oldHufTable, bmi2); + nbStreams, oldHufTable, flags); } =20 /* If uncompressible data is suspected, do a smaller sampling first */ DEBUG_STATIC_ASSERT(SUSPECT_INCOMPRESSIBLE_SAMPLE_RATIO >=3D 2); - if (suspectUncompressible && srcSize >=3D (SUSPECT_INCOMPRESSIBLE_SAMP= LE_SIZE * SUSPECT_INCOMPRESSIBLE_SAMPLE_RATIO)) { + if ((flags & HUF_flags_suspectUncompressible) && srcSize >=3D (SUSPECT= _INCOMPRESSIBLE_SAMPLE_SIZE * SUSPECT_INCOMPRESSIBLE_SAMPLE_RATIO)) { size_t largestTotal =3D 0; + DEBUGLOG(5, "input suspected incompressible : sampling to check"); { unsigned maxSymbolValueBegin =3D maxSymbolValue; CHECK_V_F(largestBegin, HIST_count_simple (table->count, &maxS= ymbolValueBegin, (const BYTE*)src, SUSPECT_INCOMPRESSIBLE_SAMPLE_SIZE) ); largestTotal +=3D largestBegin; @@ -1224,6 +1381,7 @@ HUF_compress_internal (void* dst, size_t dstSize, if (largest =3D=3D srcSize) { *ostart =3D ((const BYTE*)src)[0]; r= eturn 1; } /* single symbol, rle */ if (largest <=3D (srcSize >> 7)+4) return 0; /* heuristic : prob= ably not compressible enough */ } + DEBUGLOG(6, "histogram detail completed (%zu symbols)", showU32(table-= >count, maxSymbolValue+1)); =20 /* Check validity of previous table */ if ( repeat @@ -1232,25 +1390,20 @@ HUF_compress_internal (void* dst, size_t dstSize, *repeat =3D HUF_repeat_none; } /* Heuristic : use existing table for small inputs */ - if (preferRepeat && repeat && *repeat !=3D HUF_repeat_none) { + if ((flags & HUF_flags_preferRepeat) && repeat && *repeat !=3D HUF_rep= eat_none) { return HUF_compressCTable_internal(ostart, op, oend, src, srcSize, - nbStreams, oldHufTable, bmi2); + nbStreams, oldHufTable, flags); } =20 /* Build Huffman Tree */ - huffLog =3D HUF_optimalTableLog(huffLog, srcSize, maxSymbolValue); + huffLog =3D HUF_optimalTableLog(huffLog, srcSize, maxSymbolValue, &tab= le->wksps, sizeof(table->wksps), table->CTable, table->count, flags); { size_t const maxBits =3D HUF_buildCTable_wksp(table->CTable, table= ->count, maxSymbolValue, huffLog, &table->wksps.buildCTable_wksp= , sizeof(table->wksps.buildCTable_wksp)); CHECK_F(maxBits); huffLog =3D (U32)maxBits; - } - /* Zero unused symbols in CTable, so we can check it for validity */ - { - size_t const ctableSize =3D HUF_CTABLE_SIZE_ST(maxSymbolValue); - size_t const unusedSize =3D sizeof(table->CTable) - ctableSize * s= izeof(HUF_CElt); - ZSTD_memset(table->CTable + ctableSize, 0, unusedSize); + DEBUGLOG(6, "bit distribution completed (%zu symbols)", showCTable= Bits(table->CTable + 1, maxSymbolValue+1)); } =20 /* Write table description header */ @@ -1263,7 +1416,7 @@ HUF_compress_internal (void* dst, size_t dstSize, if (oldSize <=3D hSize + newSize || hSize + 12 >=3D srcSize) { return HUF_compressCTable_internal(ostart, op, oend, src, srcSize, - nbStreams, oldHufTable,= bmi2); + nbStreams, oldHufTable,= flags); } } =20 /* Use the new huffman table */ @@ -1275,61 +1428,35 @@ HUF_compress_internal (void* dst, size_t dstSize, } return HUF_compressCTable_internal(ostart, op, oend, src, srcSize, - nbStreams, table->CTable, bmi2); -} - - -size_t HUF_compress1X_wksp (void* dst, size_t dstSize, - const void* src, size_t srcSize, - unsigned maxSymbolValue, unsigned huffLog, - void* workSpace, size_t wkspSize) -{ - return HUF_compress_internal(dst, dstSize, src, srcSize, - maxSymbolValue, huffLog, HUF_singleStream, - workSpace, wkspSize, - NULL, NULL, 0, 0 /*bmi2*/, 0); + nbStreams, table->CTable, flags); } =20 size_t HUF_compress1X_repeat (void* dst, size_t dstSize, const void* src, size_t srcSize, unsigned maxSymbolValue, unsigned huffLog, void* workSpace, size_t wkspSize, - HUF_CElt* hufTable, HUF_repeat* repeat, int preferRe= peat, - int bmi2, unsigned suspectUncompressible) + HUF_CElt* hufTable, HUF_repeat* repeat, int flags) { + DEBUGLOG(5, "HUF_compress1X_repeat (srcSize =3D %zu)", srcSize); return HUF_compress_internal(dst, dstSize, src, srcSize, maxSymbolValue, huffLog, HUF_singleStream, workSpace, wkspSize, hufTable, - repeat, preferRepeat, bmi2, suspectUncomp= ressible); -} - -/* HUF_compress4X_repeat(): - * compress input using 4 streams. - * provide workspace to generate compression tables */ -size_t HUF_compress4X_wksp (void* dst, size_t dstSize, - const void* src, size_t srcSize, - unsigned maxSymbolValue, unsigned huffLog, - void* workSpace, size_t wkspSize) -{ - return HUF_compress_internal(dst, dstSize, src, srcSize, - maxSymbolValue, huffLog, HUF_fourStreams, - workSpace, wkspSize, - NULL, NULL, 0, 0 /*bmi2*/, 0); + repeat, flags); } =20 /* HUF_compress4X_repeat(): * compress input using 4 streams. * consider skipping quickly - * re-use an existing huffman compression table */ + * reuse an existing huffman compression table */ size_t HUF_compress4X_repeat (void* dst, size_t dstSize, const void* src, size_t srcSize, unsigned maxSymbolValue, unsigned huffLog, void* workSpace, size_t wkspSize, - HUF_CElt* hufTable, HUF_repeat* repeat, int preferRe= peat, int bmi2, unsigned suspectUncompressible) + HUF_CElt* hufTable, HUF_repeat* repeat, int flags) { + DEBUGLOG(5, "HUF_compress4X_repeat (srcSize =3D %zu)", srcSize); return HUF_compress_internal(dst, dstSize, src, srcSize, maxSymbolValue, huffLog, HUF_fourStreams, workSpace, wkspSize, - hufTable, repeat, preferRepeat, bmi2, sus= pectUncompressible); + hufTable, repeat, flags); } - diff --git a/lib/zstd/compress/zstd_compress.c b/lib/zstd/compress/zstd_com= press.c index 16bb995bc6c4..c41a747413e0 100644 --- a/lib/zstd/compress/zstd_compress.c +++ b/lib/zstd/compress/zstd_compress.c @@ -1,5 +1,6 @@ +// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -11,12 +12,13 @@ /*-************************************* * Dependencies ***************************************/ +#include "../common/allocations.h" /* ZSTD_customMalloc, ZSTD_customCallo= c, ZSTD_customFree */ #include "../common/zstd_deps.h" /* INT_MAX, ZSTD_memset, ZSTD_memcpy */ #include "../common/mem.h" +#include "../common/error_private.h" #include "hist.h" /* HIST_countFast_wksp */ #define FSE_STATIC_LINKING_ONLY /* FSE_encodeSymbol */ #include "../common/fse.h" -#define HUF_STATIC_LINKING_ONLY #include "../common/huf.h" #include "zstd_compress_internal.h" #include "zstd_compress_sequences.h" @@ -27,6 +29,7 @@ #include "zstd_opt.h" #include "zstd_ldm.h" #include "zstd_compress_superblock.h" +#include "../common/bits.h" /* ZSTD_highbit32, ZSTD_rotateRight_U64 = */ =20 /* *************************************************************** * Tuning parameters @@ -44,7 +47,7 @@ * in log format, aka 17 =3D> 1 << 17 =3D=3D 128Ki positions. * This structure is only used in zstd_opt. * Since allocation is centralized for all strategies, it has to be known = here. - * The actual (selected) size of the hash table is then stored in ZSTD_mat= chState_t.hashLog3, + * The actual (selected) size of the hash table is then stored in ZSTD_Mat= chState_t.hashLog3, * so that zstd_opt.c doesn't need to know about this constant. */ #ifndef ZSTD_HASHLOG3_MAX @@ -55,14 +58,17 @@ * Helper functions ***************************************/ /* ZSTD_compressBound() - * Note that the result from this function is only compatible with the "no= rmal" - * full-block strategy. - * When there are a lot of small blocks due to frequent flush in streaming= mode - * the overhead of headers can make the compressed data to be larger than = the - * return value of ZSTD_compressBound(). + * Note that the result from this function is only valid for + * the one-pass compression functions. + * When employing the streaming mode, + * if flushes are frequently altering the size of blocks, + * the overhead from block headers can make the compressed data larger + * than the return value of ZSTD_compressBound(). */ size_t ZSTD_compressBound(size_t srcSize) { - return ZSTD_COMPRESSBOUND(srcSize); + size_t const r =3D ZSTD_COMPRESSBOUND(srcSize); + if (r=3D=3D0) return ERROR(srcSize_wrong); + return r; } =20 =20 @@ -75,12 +81,12 @@ struct ZSTD_CDict_s { ZSTD_dictContentType_e dictContentType; /* The dictContentType the CDi= ct was created with */ U32* entropyWorkspace; /* entropy workspace of HUF_WORKSPACE_SIZE byte= s */ ZSTD_cwksp workspace; - ZSTD_matchState_t matchState; + ZSTD_MatchState_t matchState; ZSTD_compressedBlockState_t cBlockState; ZSTD_customMem customMem; U32 dictID; int compressionLevel; /* 0 indicates that advanced API was used to sel= ect CDict params */ - ZSTD_paramSwitch_e useRowMatchFinder; /* Indicates whether the CDict w= as created with params that would use + ZSTD_ParamSwitch_e useRowMatchFinder; /* Indicates whether the CDict w= as created with params that would use * row-based matchfinder. Unless= the cdict is reloaded, we will use * the same greedy/lazy matchfin= der at compression time. */ @@ -130,11 +136,12 @@ ZSTD_CCtx* ZSTD_initStaticCCtx(void* workspace, size_= t workspaceSize) ZSTD_cwksp_move(&cctx->workspace, &ws); cctx->staticSize =3D workspaceSize; =20 - /* statically sized space. entropyWorkspace never moves (but prev/next= block swap places) */ - if (!ZSTD_cwksp_check_available(&cctx->workspace, ENTROPY_WORKSPACE_SI= ZE + 2 * sizeof(ZSTD_compressedBlockState_t))) return NULL; + /* statically sized space. tmpWorkspace never moves (but prev/next blo= ck swap places) */ + if (!ZSTD_cwksp_check_available(&cctx->workspace, TMP_WORKSPACE_SIZE += 2 * sizeof(ZSTD_compressedBlockState_t))) return NULL; cctx->blockState.prevCBlock =3D (ZSTD_compressedBlockState_t*)ZSTD_cwk= sp_reserve_object(&cctx->workspace, sizeof(ZSTD_compressedBlockState_t)); cctx->blockState.nextCBlock =3D (ZSTD_compressedBlockState_t*)ZSTD_cwk= sp_reserve_object(&cctx->workspace, sizeof(ZSTD_compressedBlockState_t)); - cctx->entropyWorkspace =3D (U32*)ZSTD_cwksp_reserve_object(&cctx->work= space, ENTROPY_WORKSPACE_SIZE); + cctx->tmpWorkspace =3D ZSTD_cwksp_reserve_object(&cctx->workspace, TMP= _WORKSPACE_SIZE); + cctx->tmpWkspSize =3D TMP_WORKSPACE_SIZE; cctx->bmi2 =3D ZSTD_cpuid_bmi2(ZSTD_cpuid()); return cctx; } @@ -168,15 +175,13 @@ static void ZSTD_freeCCtxContent(ZSTD_CCtx* cctx) =20 size_t ZSTD_freeCCtx(ZSTD_CCtx* cctx) { + DEBUGLOG(3, "ZSTD_freeCCtx (address: %p)", (void*)cctx); if (cctx=3D=3DNULL) return 0; /* support free on NULL */ RETURN_ERROR_IF(cctx->staticSize, memory_allocation, "not compatible with static CCtx"); - { - int cctxInWorkspace =3D ZSTD_cwksp_owns_buffer(&cctx->workspace, c= ctx); + { int cctxInWorkspace =3D ZSTD_cwksp_owns_buffer(&cctx->workspace, c= ctx); ZSTD_freeCCtxContent(cctx); - if (!cctxInWorkspace) { - ZSTD_customFree(cctx, cctx->customMem); - } + if (!cctxInWorkspace) ZSTD_customFree(cctx, cctx->customMem); } return 0; } @@ -205,7 +210,7 @@ size_t ZSTD_sizeof_CStream(const ZSTD_CStream* zcs) } =20 /* private API call, for dictBuilder only */ -const seqStore_t* ZSTD_getSeqStore(const ZSTD_CCtx* ctx) { return &(ctx->s= eqStore); } +const SeqStore_t* ZSTD_getSeqStore(const ZSTD_CCtx* ctx) { return &(ctx->s= eqStore); } =20 /* Returns true if the strategy supports using a row based matchfinder */ static int ZSTD_rowMatchFinderSupported(const ZSTD_strategy strategy) { @@ -215,32 +220,27 @@ static int ZSTD_rowMatchFinderSupported(const ZSTD_st= rategy strategy) { /* Returns true if the strategy and useRowMatchFinder mode indicate that w= e will use the row based matchfinder * for this compression. */ -static int ZSTD_rowMatchFinderUsed(const ZSTD_strategy strategy, const ZST= D_paramSwitch_e mode) { +static int ZSTD_rowMatchFinderUsed(const ZSTD_strategy strategy, const ZST= D_ParamSwitch_e mode) { assert(mode !=3D ZSTD_ps_auto); return ZSTD_rowMatchFinderSupported(strategy) && (mode =3D=3D ZSTD_ps_= enable); } =20 /* Returns row matchfinder usage given an initial mode and cParams */ -static ZSTD_paramSwitch_e ZSTD_resolveRowMatchFinderMode(ZSTD_paramSwitch_= e mode, +static ZSTD_ParamSwitch_e ZSTD_resolveRowMatchFinderMode(ZSTD_ParamSwitch_= e mode, const ZSTD_compre= ssionParameters* const cParams) { -#if defined(ZSTD_ARCH_X86_SSE2) || defined(ZSTD_ARCH_ARM_NEON) - int const kHasSIMD128 =3D 1; -#else - int const kHasSIMD128 =3D 0; -#endif + /* The Linux Kernel does not use SIMD, and 128KB is a very common size= , e.g. in BtrFS. + * The row match finder is slower for this size without SIMD, so disab= le it. + */ + const unsigned kWindowLogLowerBound =3D 17; if (mode !=3D ZSTD_ps_auto) return mode; /* if requested enabled, but = no SIMD, we still will use row matchfinder */ mode =3D ZSTD_ps_disable; if (!ZSTD_rowMatchFinderSupported(cParams->strategy)) return mode; - if (kHasSIMD128) { - if (cParams->windowLog > 14) mode =3D ZSTD_ps_enable; - } else { - if (cParams->windowLog > 17) mode =3D ZSTD_ps_enable; - } + if (cParams->windowLog > kWindowLogLowerBound) mode =3D ZSTD_ps_enable; return mode; } =20 /* Returns block splitter usage (generally speaking, when using slower/str= onger compression modes) */ -static ZSTD_paramSwitch_e ZSTD_resolveBlockSplitterMode(ZSTD_paramSwitch_e= mode, +static ZSTD_ParamSwitch_e ZSTD_resolveBlockSplitterMode(ZSTD_ParamSwitch_e= mode, const ZSTD_compres= sionParameters* const cParams) { if (mode !=3D ZSTD_ps_auto) return mode; return (cParams->strategy >=3D ZSTD_btopt && cParams->windowLog >=3D 1= 7) ? ZSTD_ps_enable : ZSTD_ps_disable; @@ -248,7 +248,7 @@ static ZSTD_paramSwitch_e ZSTD_resolveBlockSplitterMode= (ZSTD_paramSwitch_e mode, =20 /* Returns 1 if the arguments indicate that we should allocate a chainTabl= e, 0 otherwise */ static int ZSTD_allocateChainTable(const ZSTD_strategy strategy, - const ZSTD_paramSwitch_e useRowMatchFin= der, + const ZSTD_ParamSwitch_e useRowMatchFin= der, const U32 forDDSDict) { assert(useRowMatchFinder !=3D ZSTD_ps_auto); /* We always should allocate a chaintable if we are allocating a match= state for a DDS dictionary matchstate. @@ -257,16 +257,44 @@ static int ZSTD_allocateChainTable(const ZSTD_strateg= y strategy, return forDDSDict || ((strategy !=3D ZSTD_fast) && !ZSTD_rowMatchFinde= rUsed(strategy, useRowMatchFinder)); } =20 -/* Returns 1 if compression parameters are such that we should +/* Returns ZSTD_ps_enable if compression parameters are such that we should * enable long distance matching (wlog >=3D 27, strategy >=3D btopt). - * Returns 0 otherwise. + * Returns ZSTD_ps_disable otherwise. */ -static ZSTD_paramSwitch_e ZSTD_resolveEnableLdm(ZSTD_paramSwitch_e mode, +static ZSTD_ParamSwitch_e ZSTD_resolveEnableLdm(ZSTD_ParamSwitch_e mode, const ZSTD_compressionParameters* const c= Params) { if (mode !=3D ZSTD_ps_auto) return mode; return (cParams->strategy >=3D ZSTD_btopt && cParams->windowLog >=3D 2= 7) ? ZSTD_ps_enable : ZSTD_ps_disable; } =20 +static int ZSTD_resolveExternalSequenceValidation(int mode) { + return mode; +} + +/* Resolves maxBlockSize to the default if no value is present. */ +static size_t ZSTD_resolveMaxBlockSize(size_t maxBlockSize) { + if (maxBlockSize =3D=3D 0) { + return ZSTD_BLOCKSIZE_MAX; + } else { + return maxBlockSize; + } +} + +static ZSTD_ParamSwitch_e ZSTD_resolveExternalRepcodeSearch(ZSTD_ParamSwit= ch_e value, int cLevel) { + if (value !=3D ZSTD_ps_auto) return value; + if (cLevel < 10) { + return ZSTD_ps_disable; + } else { + return ZSTD_ps_enable; + } +} + +/* Returns 1 if compression parameters are such that CDict hashtable and c= haintable indices are tagged. + * If so, the tags need to be removed in ZSTD_resetCCtx_byCopyingCDict. */ +static int ZSTD_CDictIndicesAreTagged(const ZSTD_compressionParameters* co= nst cParams) { + return cParams->strategy =3D=3D ZSTD_fast || cParams->strategy =3D=3D = ZSTD_dfast; +} + static ZSTD_CCtx_params ZSTD_makeCCtxParamsFromCParams( ZSTD_compressionParameters cParams) { @@ -282,8 +310,12 @@ static ZSTD_CCtx_params ZSTD_makeCCtxParamsFromCParams( assert(cctxParams.ldmParams.hashLog >=3D cctxParams.ldmParams.buck= etSizeLog); assert(cctxParams.ldmParams.hashRateLog < 32); } - cctxParams.useBlockSplitter =3D ZSTD_resolveBlockSplitterMode(cctxPara= ms.useBlockSplitter, &cParams); + cctxParams.postBlockSplitter =3D ZSTD_resolveBlockSplitterMode(cctxPar= ams.postBlockSplitter, &cParams); cctxParams.useRowMatchFinder =3D ZSTD_resolveRowMatchFinderMode(cctxPa= rams.useRowMatchFinder, &cParams); + cctxParams.validateSequences =3D ZSTD_resolveExternalSequenceValidatio= n(cctxParams.validateSequences); + cctxParams.maxBlockSize =3D ZSTD_resolveMaxBlockSize(cctxParams.maxBlo= ckSize); + cctxParams.searchForExternalRepcodes =3D ZSTD_resolveExternalRepcodeSe= arch(cctxParams.searchForExternalRepcodes, + = cctxParams.compressionLevel); assert(!ZSTD_checkCParams(cParams)); return cctxParams; } @@ -329,10 +361,13 @@ size_t ZSTD_CCtxParams_init(ZSTD_CCtx_params* cctxPar= ams, int compressionLevel) #define ZSTD_NO_CLEVEL 0 =20 /* - * Initializes the cctxParams from params and compressionLevel. + * Initializes `cctxParams` from `params` and `compressionLevel`. * @param compressionLevel If params are derived from a compression level = then that compression level, otherwise ZSTD_NO_CLEVEL. */ -static void ZSTD_CCtxParams_init_internal(ZSTD_CCtx_params* cctxParams, ZS= TD_parameters const* params, int compressionLevel) +static void +ZSTD_CCtxParams_init_internal(ZSTD_CCtx_params* cctxParams, + const ZSTD_parameters* params, + int compressionLevel) { assert(!ZSTD_checkCParams(params->cParams)); ZSTD_memset(cctxParams, 0, sizeof(*cctxParams)); @@ -343,10 +378,13 @@ static void ZSTD_CCtxParams_init_internal(ZSTD_CCtx_p= arams* cctxParams, ZSTD_par */ cctxParams->compressionLevel =3D compressionLevel; cctxParams->useRowMatchFinder =3D ZSTD_resolveRowMatchFinderMode(cctxP= arams->useRowMatchFinder, ¶ms->cParams); - cctxParams->useBlockSplitter =3D ZSTD_resolveBlockSplitterMode(cctxPar= ams->useBlockSplitter, ¶ms->cParams); + cctxParams->postBlockSplitter =3D ZSTD_resolveBlockSplitterMode(cctxPa= rams->postBlockSplitter, ¶ms->cParams); cctxParams->ldmParams.enableLdm =3D ZSTD_resolveEnableLdm(cctxParams->= ldmParams.enableLdm, ¶ms->cParams); + cctxParams->validateSequences =3D ZSTD_resolveExternalSequenceValidati= on(cctxParams->validateSequences); + cctxParams->maxBlockSize =3D ZSTD_resolveMaxBlockSize(cctxParams->maxB= lockSize); + cctxParams->searchForExternalRepcodes =3D ZSTD_resolveExternalRepcodeS= earch(cctxParams->searchForExternalRepcodes, compressionLevel); DEBUGLOG(4, "ZSTD_CCtxParams_init_internal: useRowMatchFinder=3D%d, us= eBlockSplitter=3D%d ldm=3D%d", - cctxParams->useRowMatchFinder, cctxParams->useBlockSplitte= r, cctxParams->ldmParams.enableLdm); + cctxParams->useRowMatchFinder, cctxParams->postBlockSplitt= er, cctxParams->ldmParams.enableLdm); } =20 size_t ZSTD_CCtxParams_init_advanced(ZSTD_CCtx_params* cctxParams, ZSTD_pa= rameters params) @@ -359,7 +397,7 @@ size_t ZSTD_CCtxParams_init_advanced(ZSTD_CCtx_params* = cctxParams, ZSTD_paramete =20 /* * Sets cctxParams' cParams and fParams from params, but otherwise leaves = them alone. - * @param param Validated zstd parameters. + * @param params Validated zstd parameters. */ static void ZSTD_CCtxParams_setZstdParams( ZSTD_CCtx_params* cctxParams, const ZSTD_parameters* params) @@ -455,8 +493,8 @@ ZSTD_bounds ZSTD_cParam_getBounds(ZSTD_cParameter param) return bounds; =20 case ZSTD_c_enableLongDistanceMatching: - bounds.lowerBound =3D 0; - bounds.upperBound =3D 1; + bounds.lowerBound =3D (int)ZSTD_ps_auto; + bounds.upperBound =3D (int)ZSTD_ps_disable; return bounds; =20 case ZSTD_c_ldmHashLog: @@ -534,11 +572,16 @@ ZSTD_bounds ZSTD_cParam_getBounds(ZSTD_cParameter par= am) bounds.upperBound =3D 1; return bounds; =20 - case ZSTD_c_useBlockSplitter: + case ZSTD_c_splitAfterSequences: bounds.lowerBound =3D (int)ZSTD_ps_auto; bounds.upperBound =3D (int)ZSTD_ps_disable; return bounds; =20 + case ZSTD_c_blockSplitterLevel: + bounds.lowerBound =3D 0; + bounds.upperBound =3D ZSTD_BLOCKSPLITTER_LEVEL_MAX; + return bounds; + case ZSTD_c_useRowMatchFinder: bounds.lowerBound =3D (int)ZSTD_ps_auto; bounds.upperBound =3D (int)ZSTD_ps_disable; @@ -549,6 +592,26 @@ ZSTD_bounds ZSTD_cParam_getBounds(ZSTD_cParameter para= m) bounds.upperBound =3D 1; return bounds; =20 + case ZSTD_c_prefetchCDictTables: + bounds.lowerBound =3D (int)ZSTD_ps_auto; + bounds.upperBound =3D (int)ZSTD_ps_disable; + return bounds; + + case ZSTD_c_enableSeqProducerFallback: + bounds.lowerBound =3D 0; + bounds.upperBound =3D 1; + return bounds; + + case ZSTD_c_maxBlockSize: + bounds.lowerBound =3D ZSTD_BLOCKSIZE_MAX_MIN; + bounds.upperBound =3D ZSTD_BLOCKSIZE_MAX; + return bounds; + + case ZSTD_c_repcodeResolution: + bounds.lowerBound =3D (int)ZSTD_ps_auto; + bounds.upperBound =3D (int)ZSTD_ps_disable; + return bounds; + default: bounds.error =3D ERROR(parameter_unsupported); return bounds; @@ -567,10 +630,11 @@ static size_t ZSTD_cParam_clampBounds(ZSTD_cParameter= cParam, int* value) return 0; } =20 -#define BOUNDCHECK(cParam, val) { \ - RETURN_ERROR_IF(!ZSTD_cParam_withinBounds(cParam,val), \ - parameter_outOfBound, "Param out of bounds"); \ -} +#define BOUNDCHECK(cParam, val) \ + do { \ + RETURN_ERROR_IF(!ZSTD_cParam_withinBounds(cParam,val), \ + parameter_outOfBound, "Param out of bounds"); \ + } while (0) =20 =20 static int ZSTD_isUpdateAuthorized(ZSTD_cParameter param) @@ -584,6 +648,7 @@ static int ZSTD_isUpdateAuthorized(ZSTD_cParameter para= m) case ZSTD_c_minMatch: case ZSTD_c_targetLength: case ZSTD_c_strategy: + case ZSTD_c_blockSplitterLevel: return 1; =20 case ZSTD_c_format: @@ -610,9 +675,13 @@ static int ZSTD_isUpdateAuthorized(ZSTD_cParameter par= am) case ZSTD_c_stableOutBuffer: case ZSTD_c_blockDelimiters: case ZSTD_c_validateSequences: - case ZSTD_c_useBlockSplitter: + case ZSTD_c_splitAfterSequences: case ZSTD_c_useRowMatchFinder: case ZSTD_c_deterministicRefPrefix: + case ZSTD_c_prefetchCDictTables: + case ZSTD_c_enableSeqProducerFallback: + case ZSTD_c_maxBlockSize: + case ZSTD_c_repcodeResolution: default: return 0; } @@ -625,7 +694,7 @@ size_t ZSTD_CCtx_setParameter(ZSTD_CCtx* cctx, ZSTD_cPa= rameter param, int value) if (ZSTD_isUpdateAuthorized(param)) { cctx->cParamsChanged =3D 1; } else { - RETURN_ERROR(stage_wrong, "can only set params in ctx init sta= ge"); + RETURN_ERROR(stage_wrong, "can only set params in cctx init st= age"); } } =20 switch(param) @@ -665,9 +734,14 @@ size_t ZSTD_CCtx_setParameter(ZSTD_CCtx* cctx, ZSTD_cP= arameter param, int value) case ZSTD_c_stableOutBuffer: case ZSTD_c_blockDelimiters: case ZSTD_c_validateSequences: - case ZSTD_c_useBlockSplitter: + case ZSTD_c_splitAfterSequences: + case ZSTD_c_blockSplitterLevel: case ZSTD_c_useRowMatchFinder: case ZSTD_c_deterministicRefPrefix: + case ZSTD_c_prefetchCDictTables: + case ZSTD_c_enableSeqProducerFallback: + case ZSTD_c_maxBlockSize: + case ZSTD_c_repcodeResolution: break; =20 default: RETURN_ERROR(parameter_unsupported, "unknown parameter"); @@ -723,12 +797,12 @@ size_t ZSTD_CCtxParams_setParameter(ZSTD_CCtx_params*= CCtxParams, case ZSTD_c_minMatch : if (value!=3D0) /* 0 =3D> use default */ BOUNDCHECK(ZSTD_c_minMatch, value); - CCtxParams->cParams.minMatch =3D value; + CCtxParams->cParams.minMatch =3D (U32)value; return CCtxParams->cParams.minMatch; =20 case ZSTD_c_targetLength : BOUNDCHECK(ZSTD_c_targetLength, value); - CCtxParams->cParams.targetLength =3D value; + CCtxParams->cParams.targetLength =3D (U32)value; return CCtxParams->cParams.targetLength; =20 case ZSTD_c_strategy : @@ -741,12 +815,12 @@ size_t ZSTD_CCtxParams_setParameter(ZSTD_CCtx_params*= CCtxParams, /* Content size written in frame header _when known_ (default:1) */ DEBUGLOG(4, "set content size flag =3D %u", (value!=3D0)); CCtxParams->fParams.contentSizeFlag =3D value !=3D 0; - return CCtxParams->fParams.contentSizeFlag; + return (size_t)CCtxParams->fParams.contentSizeFlag; =20 case ZSTD_c_checksumFlag : /* A 32-bits content checksum will be calculated and written at en= d of frame (default:0) */ CCtxParams->fParams.checksumFlag =3D value !=3D 0; - return CCtxParams->fParams.checksumFlag; + return (size_t)CCtxParams->fParams.checksumFlag; =20 case ZSTD_c_dictIDFlag : /* When applicable, dictionary's dictID is pr= ovided in frame header (default:1) */ DEBUGLOG(4, "set dictIDFlag =3D %u", (value!=3D0)); @@ -755,18 +829,18 @@ size_t ZSTD_CCtxParams_setParameter(ZSTD_CCtx_params*= CCtxParams, =20 case ZSTD_c_forceMaxWindow : CCtxParams->forceWindow =3D (value !=3D 0); - return CCtxParams->forceWindow; + return (size_t)CCtxParams->forceWindow; =20 case ZSTD_c_forceAttachDict : { const ZSTD_dictAttachPref_e pref =3D (ZSTD_dictAttachPref_e)value; - BOUNDCHECK(ZSTD_c_forceAttachDict, pref); + BOUNDCHECK(ZSTD_c_forceAttachDict, (int)pref); CCtxParams->attachDictPref =3D pref; return CCtxParams->attachDictPref; } =20 case ZSTD_c_literalCompressionMode : { - const ZSTD_paramSwitch_e lcm =3D (ZSTD_paramSwitch_e)value; - BOUNDCHECK(ZSTD_c_literalCompressionMode, lcm); + const ZSTD_ParamSwitch_e lcm =3D (ZSTD_ParamSwitch_e)value; + BOUNDCHECK(ZSTD_c_literalCompressionMode, (int)lcm); CCtxParams->literalCompressionMode =3D lcm; return CCtxParams->literalCompressionMode; } @@ -789,47 +863,50 @@ size_t ZSTD_CCtxParams_setParameter(ZSTD_CCtx_params*= CCtxParams, =20 case ZSTD_c_enableDedicatedDictSearch : CCtxParams->enableDedicatedDictSearch =3D (value!=3D0); - return CCtxParams->enableDedicatedDictSearch; + return (size_t)CCtxParams->enableDedicatedDictSearch; =20 case ZSTD_c_enableLongDistanceMatching : - CCtxParams->ldmParams.enableLdm =3D (ZSTD_paramSwitch_e)value; + BOUNDCHECK(ZSTD_c_enableLongDistanceMatching, value); + CCtxParams->ldmParams.enableLdm =3D (ZSTD_ParamSwitch_e)value; return CCtxParams->ldmParams.enableLdm; =20 case ZSTD_c_ldmHashLog : if (value!=3D0) /* 0 =3D=3D> auto */ BOUNDCHECK(ZSTD_c_ldmHashLog, value); - CCtxParams->ldmParams.hashLog =3D value; + CCtxParams->ldmParams.hashLog =3D (U32)value; return CCtxParams->ldmParams.hashLog; =20 case ZSTD_c_ldmMinMatch : if (value!=3D0) /* 0 =3D=3D> default */ BOUNDCHECK(ZSTD_c_ldmMinMatch, value); - CCtxParams->ldmParams.minMatchLength =3D value; + CCtxParams->ldmParams.minMatchLength =3D (U32)value; return CCtxParams->ldmParams.minMatchLength; =20 case ZSTD_c_ldmBucketSizeLog : if (value!=3D0) /* 0 =3D=3D> default */ BOUNDCHECK(ZSTD_c_ldmBucketSizeLog, value); - CCtxParams->ldmParams.bucketSizeLog =3D value; + CCtxParams->ldmParams.bucketSizeLog =3D (U32)value; return CCtxParams->ldmParams.bucketSizeLog; =20 case ZSTD_c_ldmHashRateLog : if (value!=3D0) /* 0 =3D=3D> default */ BOUNDCHECK(ZSTD_c_ldmHashRateLog, value); - CCtxParams->ldmParams.hashRateLog =3D value; + CCtxParams->ldmParams.hashRateLog =3D (U32)value; return CCtxParams->ldmParams.hashRateLog; =20 case ZSTD_c_targetCBlockSize : - if (value!=3D0) /* 0 =3D=3D> default */ + if (value!=3D0) { /* 0 =3D=3D> default */ + value =3D MAX(value, ZSTD_TARGETCBLOCKSIZE_MIN); BOUNDCHECK(ZSTD_c_targetCBlockSize, value); - CCtxParams->targetCBlockSize =3D value; + } + CCtxParams->targetCBlockSize =3D (U32)value; return CCtxParams->targetCBlockSize; =20 case ZSTD_c_srcSizeHint : if (value!=3D0) /* 0 =3D=3D> default */ BOUNDCHECK(ZSTD_c_srcSizeHint, value); CCtxParams->srcSizeHint =3D value; - return CCtxParams->srcSizeHint; + return (size_t)CCtxParams->srcSizeHint; =20 case ZSTD_c_stableInBuffer: BOUNDCHECK(ZSTD_c_stableInBuffer, value); @@ -843,28 +920,55 @@ size_t ZSTD_CCtxParams_setParameter(ZSTD_CCtx_params*= CCtxParams, =20 case ZSTD_c_blockDelimiters: BOUNDCHECK(ZSTD_c_blockDelimiters, value); - CCtxParams->blockDelimiters =3D (ZSTD_sequenceFormat_e)value; + CCtxParams->blockDelimiters =3D (ZSTD_SequenceFormat_e)value; return CCtxParams->blockDelimiters; =20 case ZSTD_c_validateSequences: BOUNDCHECK(ZSTD_c_validateSequences, value); CCtxParams->validateSequences =3D value; - return CCtxParams->validateSequences; + return (size_t)CCtxParams->validateSequences; + + case ZSTD_c_splitAfterSequences: + BOUNDCHECK(ZSTD_c_splitAfterSequences, value); + CCtxParams->postBlockSplitter =3D (ZSTD_ParamSwitch_e)value; + return CCtxParams->postBlockSplitter; =20 - case ZSTD_c_useBlockSplitter: - BOUNDCHECK(ZSTD_c_useBlockSplitter, value); - CCtxParams->useBlockSplitter =3D (ZSTD_paramSwitch_e)value; - return CCtxParams->useBlockSplitter; + case ZSTD_c_blockSplitterLevel: + BOUNDCHECK(ZSTD_c_blockSplitterLevel, value); + CCtxParams->preBlockSplitter_level =3D value; + return (size_t)CCtxParams->preBlockSplitter_level; =20 case ZSTD_c_useRowMatchFinder: BOUNDCHECK(ZSTD_c_useRowMatchFinder, value); - CCtxParams->useRowMatchFinder =3D (ZSTD_paramSwitch_e)value; + CCtxParams->useRowMatchFinder =3D (ZSTD_ParamSwitch_e)value; return CCtxParams->useRowMatchFinder; =20 case ZSTD_c_deterministicRefPrefix: BOUNDCHECK(ZSTD_c_deterministicRefPrefix, value); CCtxParams->deterministicRefPrefix =3D !!value; - return CCtxParams->deterministicRefPrefix; + return (size_t)CCtxParams->deterministicRefPrefix; + + case ZSTD_c_prefetchCDictTables: + BOUNDCHECK(ZSTD_c_prefetchCDictTables, value); + CCtxParams->prefetchCDictTables =3D (ZSTD_ParamSwitch_e)value; + return CCtxParams->prefetchCDictTables; + + case ZSTD_c_enableSeqProducerFallback: + BOUNDCHECK(ZSTD_c_enableSeqProducerFallback, value); + CCtxParams->enableMatchFinderFallback =3D value; + return (size_t)CCtxParams->enableMatchFinderFallback; + + case ZSTD_c_maxBlockSize: + if (value!=3D0) /* 0 =3D=3D> default */ + BOUNDCHECK(ZSTD_c_maxBlockSize, value); + assert(value>=3D0); + CCtxParams->maxBlockSize =3D (size_t)value; + return CCtxParams->maxBlockSize; + + case ZSTD_c_repcodeResolution: + BOUNDCHECK(ZSTD_c_repcodeResolution, value); + CCtxParams->searchForExternalRepcodes =3D (ZSTD_ParamSwitch_e)valu= e; + return CCtxParams->searchForExternalRepcodes; =20 default: RETURN_ERROR(parameter_unsupported, "unknown parameter"); } @@ -881,7 +985,7 @@ size_t ZSTD_CCtxParams_getParameter( switch(param) { case ZSTD_c_format : - *value =3D CCtxParams->format; + *value =3D (int)CCtxParams->format; break; case ZSTD_c_compressionLevel : *value =3D CCtxParams->compressionLevel; @@ -896,16 +1000,16 @@ size_t ZSTD_CCtxParams_getParameter( *value =3D (int)CCtxParams->cParams.chainLog; break; case ZSTD_c_searchLog : - *value =3D CCtxParams->cParams.searchLog; + *value =3D (int)CCtxParams->cParams.searchLog; break; case ZSTD_c_minMatch : - *value =3D CCtxParams->cParams.minMatch; + *value =3D (int)CCtxParams->cParams.minMatch; break; case ZSTD_c_targetLength : - *value =3D CCtxParams->cParams.targetLength; + *value =3D (int)CCtxParams->cParams.targetLength; break; case ZSTD_c_strategy : - *value =3D (unsigned)CCtxParams->cParams.strategy; + *value =3D (int)CCtxParams->cParams.strategy; break; case ZSTD_c_contentSizeFlag : *value =3D CCtxParams->fParams.contentSizeFlag; @@ -920,10 +1024,10 @@ size_t ZSTD_CCtxParams_getParameter( *value =3D CCtxParams->forceWindow; break; case ZSTD_c_forceAttachDict : - *value =3D CCtxParams->attachDictPref; + *value =3D (int)CCtxParams->attachDictPref; break; case ZSTD_c_literalCompressionMode : - *value =3D CCtxParams->literalCompressionMode; + *value =3D (int)CCtxParams->literalCompressionMode; break; case ZSTD_c_nbWorkers : assert(CCtxParams->nbWorkers =3D=3D 0); @@ -939,19 +1043,19 @@ size_t ZSTD_CCtxParams_getParameter( *value =3D CCtxParams->enableDedicatedDictSearch; break; case ZSTD_c_enableLongDistanceMatching : - *value =3D CCtxParams->ldmParams.enableLdm; + *value =3D (int)CCtxParams->ldmParams.enableLdm; break; case ZSTD_c_ldmHashLog : - *value =3D CCtxParams->ldmParams.hashLog; + *value =3D (int)CCtxParams->ldmParams.hashLog; break; case ZSTD_c_ldmMinMatch : - *value =3D CCtxParams->ldmParams.minMatchLength; + *value =3D (int)CCtxParams->ldmParams.minMatchLength; break; case ZSTD_c_ldmBucketSizeLog : - *value =3D CCtxParams->ldmParams.bucketSizeLog; + *value =3D (int)CCtxParams->ldmParams.bucketSizeLog; break; case ZSTD_c_ldmHashRateLog : - *value =3D CCtxParams->ldmParams.hashRateLog; + *value =3D (int)CCtxParams->ldmParams.hashRateLog; break; case ZSTD_c_targetCBlockSize : *value =3D (int)CCtxParams->targetCBlockSize; @@ -971,8 +1075,11 @@ size_t ZSTD_CCtxParams_getParameter( case ZSTD_c_validateSequences : *value =3D (int)CCtxParams->validateSequences; break; - case ZSTD_c_useBlockSplitter : - *value =3D (int)CCtxParams->useBlockSplitter; + case ZSTD_c_splitAfterSequences : + *value =3D (int)CCtxParams->postBlockSplitter; + break; + case ZSTD_c_blockSplitterLevel : + *value =3D CCtxParams->preBlockSplitter_level; break; case ZSTD_c_useRowMatchFinder : *value =3D (int)CCtxParams->useRowMatchFinder; @@ -980,6 +1087,18 @@ size_t ZSTD_CCtxParams_getParameter( case ZSTD_c_deterministicRefPrefix: *value =3D (int)CCtxParams->deterministicRefPrefix; break; + case ZSTD_c_prefetchCDictTables: + *value =3D (int)CCtxParams->prefetchCDictTables; + break; + case ZSTD_c_enableSeqProducerFallback: + *value =3D CCtxParams->enableMatchFinderFallback; + break; + case ZSTD_c_maxBlockSize: + *value =3D (int)CCtxParams->maxBlockSize; + break; + case ZSTD_c_repcodeResolution: + *value =3D (int)CCtxParams->searchForExternalRepcodes; + break; default: RETURN_ERROR(parameter_unsupported, "unknown parameter"); } return 0; @@ -1006,9 +1125,47 @@ size_t ZSTD_CCtx_setParametersUsingCCtxParams( return 0; } =20 +size_t ZSTD_CCtx_setCParams(ZSTD_CCtx* cctx, ZSTD_compressionParameters cp= arams) +{ + ZSTD_STATIC_ASSERT(sizeof(cparams) =3D=3D 7 * 4 /* all params are list= ed below */); + DEBUGLOG(4, "ZSTD_CCtx_setCParams"); + /* only update if all parameters are valid */ + FORWARD_IF_ERROR(ZSTD_checkCParams(cparams), ""); + FORWARD_IF_ERROR(ZSTD_CCtx_setParameter(cctx, ZSTD_c_windowLog, (int)c= params.windowLog), ""); + FORWARD_IF_ERROR(ZSTD_CCtx_setParameter(cctx, ZSTD_c_chainLog, (int)cp= arams.chainLog), ""); + FORWARD_IF_ERROR(ZSTD_CCtx_setParameter(cctx, ZSTD_c_hashLog, (int)cpa= rams.hashLog), ""); + FORWARD_IF_ERROR(ZSTD_CCtx_setParameter(cctx, ZSTD_c_searchLog, (int)c= params.searchLog), ""); + FORWARD_IF_ERROR(ZSTD_CCtx_setParameter(cctx, ZSTD_c_minMatch, (int)cp= arams.minMatch), ""); + FORWARD_IF_ERROR(ZSTD_CCtx_setParameter(cctx, ZSTD_c_targetLength, (in= t)cparams.targetLength), ""); + FORWARD_IF_ERROR(ZSTD_CCtx_setParameter(cctx, ZSTD_c_strategy, (int)cp= arams.strategy), ""); + return 0; +} + +size_t ZSTD_CCtx_setFParams(ZSTD_CCtx* cctx, ZSTD_frameParameters fparams) +{ + ZSTD_STATIC_ASSERT(sizeof(fparams) =3D=3D 3 * 4 /* all params are list= ed below */); + DEBUGLOG(4, "ZSTD_CCtx_setFParams"); + FORWARD_IF_ERROR(ZSTD_CCtx_setParameter(cctx, ZSTD_c_contentSizeFlag, = fparams.contentSizeFlag !=3D 0), ""); + FORWARD_IF_ERROR(ZSTD_CCtx_setParameter(cctx, ZSTD_c_checksumFlag, fpa= rams.checksumFlag !=3D 0), ""); + FORWARD_IF_ERROR(ZSTD_CCtx_setParameter(cctx, ZSTD_c_dictIDFlag, fpara= ms.noDictIDFlag =3D=3D 0), ""); + return 0; +} + +size_t ZSTD_CCtx_setParams(ZSTD_CCtx* cctx, ZSTD_parameters params) +{ + DEBUGLOG(4, "ZSTD_CCtx_setParams"); + /* First check cParams, because we want to update all or none. */ + FORWARD_IF_ERROR(ZSTD_checkCParams(params.cParams), ""); + /* Next set fParams, because this could fail if the cctx isn't in init= stage. */ + FORWARD_IF_ERROR(ZSTD_CCtx_setFParams(cctx, params.fParams), ""); + /* Finally set cParams, which should succeed. */ + FORWARD_IF_ERROR(ZSTD_CCtx_setCParams(cctx, params.cParams), ""); + return 0; +} + size_t ZSTD_CCtx_setPledgedSrcSize(ZSTD_CCtx* cctx, unsigned long long ple= dgedSrcSize) { - DEBUGLOG(4, "ZSTD_CCtx_setPledgedSrcSize to %u bytes", (U32)pledgedSrc= Size); + DEBUGLOG(4, "ZSTD_CCtx_setPledgedSrcSize to %llu bytes", pledgedSrcSiz= e); RETURN_ERROR_IF(cctx->streamStage !=3D zcss_init, stage_wrong, "Can't set pledgedSrcSize when not in init stage."); cctx->pledgedSrcSizePlusOne =3D pledgedSrcSize+1; @@ -1024,9 +1181,9 @@ static void ZSTD_dedicatedDictSearch_revertCParams( ZSTD_compressionParameters* cParams); =20 /* - * Initializes the local dict using the requested parameters. - * NOTE: This does not use the pledged src size, because it may be used fo= r more - * than one compression. + * Initializes the local dictionary using requested parameters. + * NOTE: Initialization does not employ the pledged src size, + * because the dictionary may be used for multiple compressions. */ static size_t ZSTD_initLocalDict(ZSTD_CCtx* cctx) { @@ -1039,8 +1196,8 @@ static size_t ZSTD_initLocalDict(ZSTD_CCtx* cctx) return 0; } if (dl->cdict !=3D NULL) { - assert(cctx->cdict =3D=3D dl->cdict); /* Local dictionary already initialized. */ + assert(cctx->cdict =3D=3D dl->cdict); return 0; } assert(dl->dictSize > 0); @@ -1060,26 +1217,30 @@ static size_t ZSTD_initLocalDict(ZSTD_CCtx* cctx) } =20 size_t ZSTD_CCtx_loadDictionary_advanced( - ZSTD_CCtx* cctx, const void* dict, size_t dictSize, - ZSTD_dictLoadMethod_e dictLoadMethod, ZSTD_dictContentType_e dictC= ontentType) + ZSTD_CCtx* cctx, + const void* dict, size_t dictSize, + ZSTD_dictLoadMethod_e dictLoadMethod, + ZSTD_dictContentType_e dictContentType) { - RETURN_ERROR_IF(cctx->streamStage !=3D zcss_init, stage_wrong, - "Can't load a dictionary when ctx is not in init stage= ."); DEBUGLOG(4, "ZSTD_CCtx_loadDictionary_advanced (size: %u)", (U32)dictS= ize); - ZSTD_clearAllDicts(cctx); /* in case one already exists */ - if (dict =3D=3D NULL || dictSize =3D=3D 0) /* no dictionary mode */ + RETURN_ERROR_IF(cctx->streamStage !=3D zcss_init, stage_wrong, + "Can't load a dictionary when cctx is not in init stag= e."); + ZSTD_clearAllDicts(cctx); /* erase any previously set dictionary */ + if (dict =3D=3D NULL || dictSize =3D=3D 0) /* no dictionary */ return 0; if (dictLoadMethod =3D=3D ZSTD_dlm_byRef) { cctx->localDict.dict =3D dict; } else { + /* copy dictionary content inside CCtx to own its lifetime */ void* dictBuffer; RETURN_ERROR_IF(cctx->staticSize, memory_allocation, - "no malloc for static CCtx"); + "static CCtx can't allocate for an internal copy o= f dictionary"); dictBuffer =3D ZSTD_customMalloc(dictSize, cctx->customMem); - RETURN_ERROR_IF(!dictBuffer, memory_allocation, "NULL pointer!"); + RETURN_ERROR_IF(dictBuffer=3D=3DNULL, memory_allocation, + "allocation failed for dictionary content"); ZSTD_memcpy(dictBuffer, dict, dictSize); - cctx->localDict.dictBuffer =3D dictBuffer; - cctx->localDict.dict =3D dictBuffer; + cctx->localDict.dictBuffer =3D dictBuffer; /* owned ptr to free */ + cctx->localDict.dict =3D dictBuffer; /* read-only reference= */ } cctx->localDict.dictSize =3D dictSize; cctx->localDict.dictContentType =3D dictContentType; @@ -1149,7 +1310,7 @@ size_t ZSTD_CCtx_reset(ZSTD_CCtx* cctx, ZSTD_ResetDir= ective reset) if ( (reset =3D=3D ZSTD_reset_parameters) || (reset =3D=3D ZSTD_reset_session_and_parameters) ) { RETURN_ERROR_IF(cctx->streamStage !=3D zcss_init, stage_wrong, - "Can't reset parameters only when not in init stag= e."); + "Reset parameters is only possible during init sta= ge."); ZSTD_clearAllDicts(cctx); return ZSTD_CCtxParams_reset(&cctx->requestedParams); } @@ -1168,7 +1329,7 @@ size_t ZSTD_checkCParams(ZSTD_compressionParameters c= Params) BOUNDCHECK(ZSTD_c_searchLog, (int)cParams.searchLog); BOUNDCHECK(ZSTD_c_minMatch, (int)cParams.minMatch); BOUNDCHECK(ZSTD_c_targetLength,(int)cParams.targetLength); - BOUNDCHECK(ZSTD_c_strategy, cParams.strategy); + BOUNDCHECK(ZSTD_c_strategy, (int)cParams.strategy); return 0; } =20 @@ -1178,11 +1339,12 @@ size_t ZSTD_checkCParams(ZSTD_compressionParameters= cParams) static ZSTD_compressionParameters ZSTD_clampCParams(ZSTD_compressionParameters cParams) { -# define CLAMP_TYPE(cParam, val, type) { \ - ZSTD_bounds const bounds =3D ZSTD_cParam_getBounds(cParam); = \ - if ((int)valbounds.upperBound) val=3D(type)bounds.upperBound= ; \ - } +# define CLAMP_TYPE(cParam, val, type) = \ + do { = \ + ZSTD_bounds const bounds =3D ZSTD_cParam_getBounds(cParam); = \ + if ((int)valbounds.upperBound) val=3D(type)bounds.upperB= ound; \ + } while (0) # define CLAMP(cParam, val) CLAMP_TYPE(cParam, val, unsigned) CLAMP(ZSTD_c_windowLog, cParams.windowLog); CLAMP(ZSTD_c_chainLog, cParams.chainLog); @@ -1240,19 +1402,62 @@ static U32 ZSTD_dictAndWindowLog(U32 windowLog, U64= srcSize, U64 dictSize) * optimize `cPar` for a specified input (`srcSize` and `dictSize`). * mostly downsize to reduce memory consumption and initialization latenc= y. * `srcSize` can be ZSTD_CONTENTSIZE_UNKNOWN when not known. - * `mode` is the mode for parameter adjustment. See docs for `ZSTD_cParamM= ode_e`. + * `mode` is the mode for parameter adjustment. See docs for `ZSTD_CParamM= ode_e`. * note : `srcSize=3D=3D0` means 0! * condition : cPar is presumed validated (can be checked using ZSTD_chec= kCParams()). */ static ZSTD_compressionParameters ZSTD_adjustCParams_internal(ZSTD_compressionParameters cPar, unsigned long long srcSize, size_t dictSize, - ZSTD_cParamMode_e mode) + ZSTD_CParamMode_e mode, + ZSTD_ParamSwitch_e useRowMatchFinder) { const U64 minSrcSize =3D 513; /* (1<<9) + 1 */ const U64 maxWindowResize =3D 1ULL << (ZSTD_WINDOWLOG_MAX-1); assert(ZSTD_checkCParams(cPar)=3D=3D0); =20 + /* Cascade the selected strategy down to the next-highest one built in= to + * this binary. */ +#ifdef ZSTD_EXCLUDE_BTULTRA_BLOCK_COMPRESSOR + if (cPar.strategy =3D=3D ZSTD_btultra2) { + cPar.strategy =3D ZSTD_btultra; + } + if (cPar.strategy =3D=3D ZSTD_btultra) { + cPar.strategy =3D ZSTD_btopt; + } +#endif +#ifdef ZSTD_EXCLUDE_BTOPT_BLOCK_COMPRESSOR + if (cPar.strategy =3D=3D ZSTD_btopt) { + cPar.strategy =3D ZSTD_btlazy2; + } +#endif +#ifdef ZSTD_EXCLUDE_BTLAZY2_BLOCK_COMPRESSOR + if (cPar.strategy =3D=3D ZSTD_btlazy2) { + cPar.strategy =3D ZSTD_lazy2; + } +#endif +#ifdef ZSTD_EXCLUDE_LAZY2_BLOCK_COMPRESSOR + if (cPar.strategy =3D=3D ZSTD_lazy2) { + cPar.strategy =3D ZSTD_lazy; + } +#endif +#ifdef ZSTD_EXCLUDE_LAZY_BLOCK_COMPRESSOR + if (cPar.strategy =3D=3D ZSTD_lazy) { + cPar.strategy =3D ZSTD_greedy; + } +#endif +#ifdef ZSTD_EXCLUDE_GREEDY_BLOCK_COMPRESSOR + if (cPar.strategy =3D=3D ZSTD_greedy) { + cPar.strategy =3D ZSTD_dfast; + } +#endif +#ifdef ZSTD_EXCLUDE_DFAST_BLOCK_COMPRESSOR + if (cPar.strategy =3D=3D ZSTD_dfast) { + cPar.strategy =3D ZSTD_fast; + cPar.targetLength =3D 0; + } +#endif + switch (mode) { case ZSTD_cpm_unknown: case ZSTD_cpm_noAttachDict: @@ -1281,8 +1486,8 @@ ZSTD_adjustCParams_internal(ZSTD_compressionParameter= s cPar, } =20 /* resize windowLog if input is small enough, to use less memory */ - if ( (srcSize < maxWindowResize) - && (dictSize < maxWindowResize) ) { + if ( (srcSize <=3D maxWindowResize) + && (dictSize <=3D maxWindowResize) ) { U32 const tSize =3D (U32)(srcSize + dictSize); static U32 const hashSizeMin =3D 1 << ZSTD_HASHLOG_MIN; U32 const srcLog =3D (tSize < hashSizeMin) ? ZSTD_HASHLOG_MIN : @@ -1300,6 +1505,42 @@ ZSTD_adjustCParams_internal(ZSTD_compressionParamete= rs cPar, if (cPar.windowLog < ZSTD_WINDOWLOG_ABSOLUTEMIN) cPar.windowLog =3D ZSTD_WINDOWLOG_ABSOLUTEMIN; /* minimum wlog re= quired for valid frame header */ =20 + /* We can't use more than 32 bits of hash in total, so that means that= we require: + * (hashLog + 8) <=3D 32 && (chainLog + 8) <=3D 32 + */ + if (mode =3D=3D ZSTD_cpm_createCDict && ZSTD_CDictIndicesAreTagged(&cP= ar)) { + U32 const maxShortCacheHashLog =3D 32 - ZSTD_SHORT_CACHE_TAG_BITS; + if (cPar.hashLog > maxShortCacheHashLog) { + cPar.hashLog =3D maxShortCacheHashLog; + } + if (cPar.chainLog > maxShortCacheHashLog) { + cPar.chainLog =3D maxShortCacheHashLog; + } + } + + + /* At this point, we aren't 100% sure if we are using the row match fi= nder. + * Unless it is explicitly disabled, conservatively assume that it is = enabled. + * In this case it will only be disabled for small sources, so shrinki= ng the + * hash log a little bit shouldn't result in any ratio loss. + */ + if (useRowMatchFinder =3D=3D ZSTD_ps_auto) + useRowMatchFinder =3D ZSTD_ps_enable; + + /* We can't hash more than 32-bits in total. So that means that we req= uire: + * (hashLog - rowLog + 8) <=3D 32 + */ + if (ZSTD_rowMatchFinderUsed(cPar.strategy, useRowMatchFinder)) { + /* Switch to 32-entry rows if searchLog is 5 (or more) */ + U32 const rowLog =3D BOUNDED(4, cPar.searchLog, 6); + U32 const maxRowHashLog =3D 32 - ZSTD_ROW_HASH_TAG_BITS; + U32 const maxHashLog =3D maxRowHashLog + rowLog; + assert(cPar.hashLog >=3D rowLog); + if (cPar.hashLog > maxHashLog) { + cPar.hashLog =3D maxHashLog; + } + } + return cPar; } =20 @@ -1310,11 +1551,11 @@ ZSTD_adjustCParams(ZSTD_compressionParameters cPar, { cPar =3D ZSTD_clampCParams(cPar); /* resulting cPar is necessarily v= alid (all parameters within range) */ if (srcSize =3D=3D 0) srcSize =3D ZSTD_CONTENTSIZE_UNKNOWN; - return ZSTD_adjustCParams_internal(cPar, srcSize, dictSize, ZSTD_cpm_u= nknown); + return ZSTD_adjustCParams_internal(cPar, srcSize, dictSize, ZSTD_cpm_u= nknown, ZSTD_ps_auto); } =20 -static ZSTD_compressionParameters ZSTD_getCParams_internal(int compression= Level, unsigned long long srcSizeHint, size_t dictSize, ZSTD_cParamMode_e m= ode); -static ZSTD_parameters ZSTD_getParams_internal(int compressionLevel, unsig= ned long long srcSizeHint, size_t dictSize, ZSTD_cParamMode_e mode); +static ZSTD_compressionParameters ZSTD_getCParams_internal(int compression= Level, unsigned long long srcSizeHint, size_t dictSize, ZSTD_CParamMode_e m= ode); +static ZSTD_parameters ZSTD_getParams_internal(int compressionLevel, unsig= ned long long srcSizeHint, size_t dictSize, ZSTD_CParamMode_e mode); =20 static void ZSTD_overrideCParams( ZSTD_compressionParameters* cParams, @@ -1330,24 +1571,25 @@ static void ZSTD_overrideCParams( } =20 ZSTD_compressionParameters ZSTD_getCParamsFromCCtxParams( - const ZSTD_CCtx_params* CCtxParams, U64 srcSizeHint, size_t dictSi= ze, ZSTD_cParamMode_e mode) + const ZSTD_CCtx_params* CCtxParams, U64 srcSizeHint, size_t dictSi= ze, ZSTD_CParamMode_e mode) { ZSTD_compressionParameters cParams; if (srcSizeHint =3D=3D ZSTD_CONTENTSIZE_UNKNOWN && CCtxParams->srcSize= Hint > 0) { - srcSizeHint =3D CCtxParams->srcSizeHint; + assert(CCtxParams->srcSizeHint>=3D0); + srcSizeHint =3D (U64)CCtxParams->srcSizeHint; } cParams =3D ZSTD_getCParams_internal(CCtxParams->compressionLevel, src= SizeHint, dictSize, mode); if (CCtxParams->ldmParams.enableLdm =3D=3D ZSTD_ps_enable) cParams.win= dowLog =3D ZSTD_LDM_DEFAULT_WINDOW_LOG; ZSTD_overrideCParams(&cParams, &CCtxParams->cParams); assert(!ZSTD_checkCParams(cParams)); /* srcSizeHint =3D=3D 0 means 0 */ - return ZSTD_adjustCParams_internal(cParams, srcSizeHint, dictSize, mod= e); + return ZSTD_adjustCParams_internal(cParams, srcSizeHint, dictSize, mod= e, CCtxParams->useRowMatchFinder); } =20 static size_t ZSTD_sizeof_matchState(const ZSTD_compressionParameters* const cParams, - const ZSTD_paramSwitch_e useRowMatchFinder, - const U32 enableDedicatedDictSearch, + const ZSTD_ParamSwitch_e useRowMatchFinder, + const int enableDedicatedDictSearch, const U32 forCCtx) { /* chain table size should be 0 for fast or row-hash strategies */ @@ -1363,14 +1605,14 @@ ZSTD_sizeof_matchState(const ZSTD_compressionParame= ters* const cParams, + hSize * sizeof(U32) + h3Size * sizeof(U32); size_t const optPotentialSpace =3D - ZSTD_cwksp_aligned_alloc_size((MaxML+1) * sizeof(U32)) - + ZSTD_cwksp_aligned_alloc_size((MaxLL+1) * sizeof(U32)) - + ZSTD_cwksp_aligned_alloc_size((MaxOff+1) * sizeof(U32)) - + ZSTD_cwksp_aligned_alloc_size((1<= strategy, useRowMatchFinder) - ? ZSTD_cwksp_aligned_alloc_siz= e(hSize*sizeof(U16)) + ? ZSTD_cwksp_aligned64_alloc_s= ize(hSize) : 0; size_t const optSpace =3D (forCCtx && (cParams->strategy >=3D ZSTD_bto= pt)) ? optPotentialSpace @@ -1386,30 +1628,38 @@ ZSTD_sizeof_matchState(const ZSTD_compressionParame= ters* const cParams, return tableSpace + optSpace + slackSpace + lazyAdditionalSpace; } =20 +/* Helper function for calculating memory requirements. + * Gives a tighter bound than ZSTD_sequenceBound() by taking minMatch into= account. */ +static size_t ZSTD_maxNbSeq(size_t blockSize, unsigned minMatch, int useSe= quenceProducer) { + U32 const divider =3D (minMatch=3D=3D3 || useSequenceProducer) ? 3 : 4; + return blockSize / divider; +} + static size_t ZSTD_estimateCCtxSize_usingCCtxParams_internal( const ZSTD_compressionParameters* cParams, const ldmParams_t* ldmParams, const int isStatic, - const ZSTD_paramSwitch_e useRowMatchFinder, + const ZSTD_ParamSwitch_e useRowMatchFinder, const size_t buffInSize, const size_t buffOutSize, - const U64 pledgedSrcSize) + const U64 pledgedSrcSize, + int useSequenceProducer, + size_t maxBlockSize) { size_t const windowSize =3D (size_t) BOUNDED(1ULL, 1ULL << cParams->wi= ndowLog, pledgedSrcSize); - size_t const blockSize =3D MIN(ZSTD_BLOCKSIZE_MAX, windowSize); - U32 const divider =3D (cParams->minMatch=3D=3D3) ? 3 : 4; - size_t const maxNbSeq =3D blockSize / divider; + size_t const blockSize =3D MIN(ZSTD_resolveMaxBlockSize(maxBlockSize),= windowSize); + size_t const maxNbSeq =3D ZSTD_maxNbSeq(blockSize, cParams->minMatch, = useSequenceProducer); size_t const tokenSpace =3D ZSTD_cwksp_alloc_size(WILDCOPY_OVERLENGTH = + blockSize) - + ZSTD_cwksp_aligned_alloc_size(maxNbSeq * siz= eof(seqDef)) + + ZSTD_cwksp_aligned64_alloc_size(maxNbSeq * s= izeof(SeqDef)) + 3 * ZSTD_cwksp_alloc_size(maxNbSeq * sizeof(= BYTE)); - size_t const entropySpace =3D ZSTD_cwksp_alloc_size(ENTROPY_WORKSPACE_= SIZE); + size_t const tmpWorkSpace =3D ZSTD_cwksp_alloc_size(TMP_WORKSPACE_SIZE= ); size_t const blockStateSpace =3D 2 * ZSTD_cwksp_alloc_size(sizeof(ZSTD= _compressedBlockState_t)); size_t const matchStateSize =3D ZSTD_sizeof_matchState(cParams, useRow= MatchFinder, /* enableDedicatedDictSearch */ 0, /* forCCtx */ 1); =20 size_t const ldmSpace =3D ZSTD_ldm_getTableSize(*ldmParams); size_t const maxNbLdmSeq =3D ZSTD_ldm_getMaxNbSeq(*ldmParams, blockSiz= e); size_t const ldmSeqSpace =3D ldmParams->enableLdm =3D=3D ZSTD_ps_enabl= e ? - ZSTD_cwksp_aligned_alloc_size(maxNbLdmSeq * sizeof(rawSeq)) : 0; + ZSTD_cwksp_aligned64_alloc_size(maxNbLdmSeq * sizeof(rawSeq)) : 0; =20 =20 size_t const bufferSpace =3D ZSTD_cwksp_alloc_size(buffInSize) @@ -1417,15 +1667,21 @@ static size_t ZSTD_estimateCCtxSize_usingCCtxParams= _internal( =20 size_t const cctxSpace =3D isStatic ? ZSTD_cwksp_alloc_size(sizeof(ZST= D_CCtx)) : 0; =20 + size_t const maxNbExternalSeq =3D ZSTD_sequenceBound(blockSize); + size_t const externalSeqSpace =3D useSequenceProducer + ? ZSTD_cwksp_aligned64_alloc_size(maxNbExternalSeq * sizeof(ZSTD_S= equence)) + : 0; + size_t const neededSpace =3D cctxSpace + - entropySpace + + tmpWorkSpace + blockStateSpace + ldmSpace + ldmSeqSpace + matchStateSize + tokenSpace + - bufferSpace; + bufferSpace + + externalSeqSpace; =20 DEBUGLOG(5, "estimate workspace : %u", (U32)neededSpace); return neededSpace; @@ -1435,7 +1691,7 @@ size_t ZSTD_estimateCCtxSize_usingCCtxParams(const ZS= TD_CCtx_params* params) { ZSTD_compressionParameters const cParams =3D ZSTD_getCParamsFromCCtxParams(params, ZSTD_CONTENTSIZE_UNK= NOWN, 0, ZSTD_cpm_noAttachDict); - ZSTD_paramSwitch_e const useRowMatchFinder =3D ZSTD_resolveRowMatchFin= derMode(params->useRowMatchFinder, + ZSTD_ParamSwitch_e const useRowMatchFinder =3D ZSTD_resolveRowMatchFin= derMode(params->useRowMatchFinder, = &cParams); =20 RETURN_ERROR_IF(params->nbWorkers > 0, GENERIC, "Estimate CCtx size is= supported for single-threaded compression only."); @@ -1443,7 +1699,7 @@ size_t ZSTD_estimateCCtxSize_usingCCtxParams(const ZS= TD_CCtx_params* params) * be needed. However, we still allocate two 0-sized buffers, which can * take space under ASAN. */ return ZSTD_estimateCCtxSize_usingCCtxParams_internal( - &cParams, ¶ms->ldmParams, 1, useRowMatchFinder, 0, 0, ZSTD_CON= TENTSIZE_UNKNOWN); + &cParams, ¶ms->ldmParams, 1, useRowMatchFinder, 0, 0, ZSTD_CON= TENTSIZE_UNKNOWN, ZSTD_hasExtSeqProd(params), params->maxBlockSize); } =20 size_t ZSTD_estimateCCtxSize_usingCParams(ZSTD_compressionParameters cPara= ms) @@ -1493,18 +1749,18 @@ size_t ZSTD_estimateCStreamSize_usingCCtxParams(con= st ZSTD_CCtx_params* params) RETURN_ERROR_IF(params->nbWorkers > 0, GENERIC, "Estimate CCtx size is= supported for single-threaded compression only."); { ZSTD_compressionParameters const cParams =3D ZSTD_getCParamsFromCCtxParams(params, ZSTD_CONTENTSIZE_UNK= NOWN, 0, ZSTD_cpm_noAttachDict); - size_t const blockSize =3D MIN(ZSTD_BLOCKSIZE_MAX, (size_t)1 << cP= arams.windowLog); + size_t const blockSize =3D MIN(ZSTD_resolveMaxBlockSize(params->ma= xBlockSize), (size_t)1 << cParams.windowLog); size_t const inBuffSize =3D (params->inBufferMode =3D=3D ZSTD_bm_b= uffered) ? ((size_t)1 << cParams.windowLog) + blockSize : 0; size_t const outBuffSize =3D (params->outBufferMode =3D=3D ZSTD_bm= _buffered) ? ZSTD_compressBound(blockSize) + 1 : 0; - ZSTD_paramSwitch_e const useRowMatchFinder =3D ZSTD_resolveRowMatc= hFinderMode(params->useRowMatchFinder, ¶ms->cParams); + ZSTD_ParamSwitch_e const useRowMatchFinder =3D ZSTD_resolveRowMatc= hFinderMode(params->useRowMatchFinder, ¶ms->cParams); =20 return ZSTD_estimateCCtxSize_usingCCtxParams_internal( &cParams, ¶ms->ldmParams, 1, useRowMatchFinder, inBuffSize= , outBuffSize, - ZSTD_CONTENTSIZE_UNKNOWN); + ZSTD_CONTENTSIZE_UNKNOWN, ZSTD_hasExtSeqProd(params), params->= maxBlockSize); } } =20 @@ -1600,7 +1856,7 @@ void ZSTD_reset_compressedBlockState(ZSTD_compressedB= lockState_t* bs) * Invalidate all the matches in the match finder tables. * Requires nextSrc and base to be set (can be NULL). */ -static void ZSTD_invalidateMatchState(ZSTD_matchState_t* ms) +static void ZSTD_invalidateMatchState(ZSTD_MatchState_t* ms) { ZSTD_window_clear(&ms->window); =20 @@ -1637,12 +1893,25 @@ typedef enum { ZSTD_resetTarget_CCtx } ZSTD_resetTarget_e; =20 +/* Mixes bits in a 64 bits in a value, based on XXH3_rrmxmx */ +static U64 ZSTD_bitmix(U64 val, U64 len) { + val ^=3D ZSTD_rotateRight_U64(val, 49) ^ ZSTD_rotateRight_U64(val, 24); + val *=3D 0x9FB21C651E98DF25ULL; + val ^=3D (val >> 35) + len ; + val *=3D 0x9FB21C651E98DF25ULL; + return val ^ (val >> 28); +} + +/* Mixes in the hashSalt and hashSaltEntropy to create a new hashSalt */ +static void ZSTD_advanceHashSalt(ZSTD_MatchState_t* ms) { + ms->hashSalt =3D ZSTD_bitmix(ms->hashSalt, 8) ^ ZSTD_bitmix((U64) ms->= hashSaltEntropy, 4); +} =20 static size_t -ZSTD_reset_matchState(ZSTD_matchState_t* ms, +ZSTD_reset_matchState(ZSTD_MatchState_t* ms, ZSTD_cwksp* ws, const ZSTD_compressionParameters* cParams, - const ZSTD_paramSwitch_e useRowMatchFinder, + const ZSTD_ParamSwitch_e useRowMatchFinder, const ZSTD_compResetPolicy_e crp, const ZSTD_indexResetPolicy_e forceResetIndex, const ZSTD_resetTarget_e forWho) @@ -1664,6 +1933,7 @@ ZSTD_reset_matchState(ZSTD_matchState_t* ms, } =20 ms->hashLog3 =3D hashLog3; + ms->lazySkipping =3D 0; =20 ZSTD_invalidateMatchState(ms); =20 @@ -1685,22 +1955,19 @@ ZSTD_reset_matchState(ZSTD_matchState_t* ms, ZSTD_cwksp_clean_tables(ws); } =20 - /* opt parser space */ - if ((forWho =3D=3D ZSTD_resetTarget_CCtx) && (cParams->strategy >=3D Z= STD_btopt)) { - DEBUGLOG(4, "reserving optimal parser space"); - ms->opt.litFreq =3D (unsigned*)ZSTD_cwksp_reserve_aligned(ws, (1<<= Litbits) * sizeof(unsigned)); - ms->opt.litLengthFreq =3D (unsigned*)ZSTD_cwksp_reserve_aligned(ws= , (MaxLL+1) * sizeof(unsigned)); - ms->opt.matchLengthFreq =3D (unsigned*)ZSTD_cwksp_reserve_aligned(= ws, (MaxML+1) * sizeof(unsigned)); - ms->opt.offCodeFreq =3D (unsigned*)ZSTD_cwksp_reserve_aligned(ws, = (MaxOff+1) * sizeof(unsigned)); - ms->opt.matchTable =3D (ZSTD_match_t*)ZSTD_cwksp_reserve_aligned(w= s, (ZSTD_OPT_NUM+1) * sizeof(ZSTD_match_t)); - ms->opt.priceTable =3D (ZSTD_optimal_t*)ZSTD_cwksp_reserve_aligned= (ws, (ZSTD_OPT_NUM+1) * sizeof(ZSTD_optimal_t)); - } - if (ZSTD_rowMatchFinderUsed(cParams->strategy, useRowMatchFinder)) { - { /* Row match finder needs an additional table of hashes ("tags= ") */ - size_t const tagTableSize =3D hSize*sizeof(U16); - ms->tagTable =3D (U16*)ZSTD_cwksp_reserve_aligned(ws, tagTable= Size); - if (ms->tagTable) ZSTD_memset(ms->tagTable, 0, tagTableSize); + /* Row match finder needs an additional table of hashes ("tags") */ + size_t const tagTableSize =3D hSize; + /* We want to generate a new salt in case we reset a Cctx, but we = always want to use + * 0 when we reset a Cdict */ + if(forWho =3D=3D ZSTD_resetTarget_CCtx) { + ms->tagTable =3D (BYTE*) ZSTD_cwksp_reserve_aligned_init_once(= ws, tagTableSize); + ZSTD_advanceHashSalt(ms); + } else { + /* When we are not salting we want to always memset the memory= */ + ms->tagTable =3D (BYTE*) ZSTD_cwksp_reserve_aligned64(ws, tagT= ableSize); + ZSTD_memset(ms->tagTable, 0, tagTableSize); + ms->hashSalt =3D 0; } { /* Switch to 32-entry rows if searchLog is 5 (or more) */ U32 const rowLog =3D BOUNDED(4, cParams->searchLog, 6); @@ -1709,6 +1976,17 @@ ZSTD_reset_matchState(ZSTD_matchState_t* ms, } } =20 + /* opt parser space */ + if ((forWho =3D=3D ZSTD_resetTarget_CCtx) && (cParams->strategy >=3D Z= STD_btopt)) { + DEBUGLOG(4, "reserving optimal parser space"); + ms->opt.litFreq =3D (unsigned*)ZSTD_cwksp_reserve_aligned64(ws, (1= <opt.litLengthFreq =3D (unsigned*)ZSTD_cwksp_reserve_aligned64(= ws, (MaxLL+1) * sizeof(unsigned)); + ms->opt.matchLengthFreq =3D (unsigned*)ZSTD_cwksp_reserve_aligned6= 4(ws, (MaxML+1) * sizeof(unsigned)); + ms->opt.offCodeFreq =3D (unsigned*)ZSTD_cwksp_reserve_aligned64(ws= , (MaxOff+1) * sizeof(unsigned)); + ms->opt.matchTable =3D (ZSTD_match_t*)ZSTD_cwksp_reserve_aligned64= (ws, ZSTD_OPT_SIZE * sizeof(ZSTD_match_t)); + ms->opt.priceTable =3D (ZSTD_optimal_t*)ZSTD_cwksp_reserve_aligned= 64(ws, ZSTD_OPT_SIZE * sizeof(ZSTD_optimal_t)); + } + ms->cParams =3D *cParams; =20 RETURN_ERROR_IF(ZSTD_cwksp_reserve_failed(ws), memory_allocation, @@ -1754,7 +2032,7 @@ static size_t ZSTD_resetCCtx_internal(ZSTD_CCtx* zc, { ZSTD_cwksp* const ws =3D &zc->workspace; DEBUGLOG(4, "ZSTD_resetCCtx_internal: pledgedSrcSize=3D%u, wlog=3D%u, = useRowMatchFinder=3D%d useBlockSplitter=3D%d", - (U32)pledgedSrcSize, params->cParams.windowLog, (int)param= s->useRowMatchFinder, (int)params->useBlockSplitter); + (U32)pledgedSrcSize, params->cParams.windowLog, (int)param= s->useRowMatchFinder, (int)params->postBlockSplitter); assert(!ZSTD_isError(ZSTD_checkCParams(params->cParams))); =20 zc->isFirstBlock =3D 1; @@ -1766,8 +2044,9 @@ static size_t ZSTD_resetCCtx_internal(ZSTD_CCtx* zc, params =3D &zc->appliedParams; =20 assert(params->useRowMatchFinder !=3D ZSTD_ps_auto); - assert(params->useBlockSplitter !=3D ZSTD_ps_auto); + assert(params->postBlockSplitter !=3D ZSTD_ps_auto); assert(params->ldmParams.enableLdm !=3D ZSTD_ps_auto); + assert(params->maxBlockSize !=3D 0); if (params->ldmParams.enableLdm =3D=3D ZSTD_ps_enable) { /* Adjust long distance matching parameters */ ZSTD_ldm_adjustParameters(&zc->appliedParams.ldmParams, ¶ms->c= Params); @@ -1776,9 +2055,8 @@ static size_t ZSTD_resetCCtx_internal(ZSTD_CCtx* zc, } =20 { size_t const windowSize =3D MAX(1, (size_t)MIN(((U64)1 << params->= cParams.windowLog), pledgedSrcSize)); - size_t const blockSize =3D MIN(ZSTD_BLOCKSIZE_MAX, windowSize); - U32 const divider =3D (params->cParams.minMatch=3D=3D3) ? 3 : 4; - size_t const maxNbSeq =3D blockSize / divider; + size_t const blockSize =3D MIN(params->maxBlockSize, windowSize); + size_t const maxNbSeq =3D ZSTD_maxNbSeq(blockSize, params->cParams= .minMatch, ZSTD_hasExtSeqProd(params)); size_t const buffOutSize =3D (zbuff =3D=3D ZSTDb_buffered && param= s->outBufferMode =3D=3D ZSTD_bm_buffered) ? ZSTD_compressBound(blockSize) + 1 : 0; @@ -1795,8 +2073,7 @@ static size_t ZSTD_resetCCtx_internal(ZSTD_CCtx* zc, size_t const neededSpace =3D ZSTD_estimateCCtxSize_usingCCtxParams_internal( ¶ms->cParams, ¶ms->ldmParams, zc->staticSize !=3D = 0, params->useRowMatchFinder, - buffInSize, buffOutSize, pledgedSrcSize); - int resizeWorkspace; + buffInSize, buffOutSize, pledgedSrcSize, ZSTD_hasExtSeqPro= d(params), params->maxBlockSize); =20 FORWARD_IF_ERROR(neededSpace, "cctx size estimate failed!"); =20 @@ -1805,7 +2082,7 @@ static size_t ZSTD_resetCCtx_internal(ZSTD_CCtx* zc, { /* Check if workspace is large enough, alloc a new one if need= ed */ int const workspaceTooSmall =3D ZSTD_cwksp_sizeof(ws) < needed= Space; int const workspaceWasteful =3D ZSTD_cwksp_check_wasteful(ws, = neededSpace); - resizeWorkspace =3D workspaceTooSmall || workspaceWasteful; + int resizeWorkspace =3D workspaceTooSmall || workspaceWasteful; DEBUGLOG(4, "Need %zu B workspace", neededSpace); DEBUGLOG(4, "windowSize: %zu - blockSize: %zu", windowSize, bl= ockSize); =20 @@ -1823,21 +2100,23 @@ static size_t ZSTD_resetCCtx_internal(ZSTD_CCtx* zc, =20 DEBUGLOG(5, "reserving object space"); /* Statically sized space. - * entropyWorkspace never moves, + * tmpWorkspace never moves, * though prev/next block swap places */ assert(ZSTD_cwksp_check_available(ws, 2 * sizeof(ZSTD_comp= ressedBlockState_t))); zc->blockState.prevCBlock =3D (ZSTD_compressedBlockState_t= *) ZSTD_cwksp_reserve_object(ws, sizeof(ZSTD_compressedBlockState_t)); RETURN_ERROR_IF(zc->blockState.prevCBlock =3D=3D NULL, mem= ory_allocation, "couldn't allocate prevCBlock"); zc->blockState.nextCBlock =3D (ZSTD_compressedBlockState_t= *) ZSTD_cwksp_reserve_object(ws, sizeof(ZSTD_compressedBlockState_t)); RETURN_ERROR_IF(zc->blockState.nextCBlock =3D=3D NULL, mem= ory_allocation, "couldn't allocate nextCBlock"); - zc->entropyWorkspace =3D (U32*) ZSTD_cwksp_reserve_object(= ws, ENTROPY_WORKSPACE_SIZE); - RETURN_ERROR_IF(zc->entropyWorkspace =3D=3D NULL, memory_a= llocation, "couldn't allocate entropyWorkspace"); + zc->tmpWorkspace =3D ZSTD_cwksp_reserve_object(ws, TMP_WOR= KSPACE_SIZE); + RETURN_ERROR_IF(zc->tmpWorkspace =3D=3D NULL, memory_alloc= ation, "couldn't allocate tmpWorkspace"); + zc->tmpWkspSize =3D TMP_WORKSPACE_SIZE; } } =20 ZSTD_cwksp_clear(ws); =20 /* init params */ zc->blockState.matchState.cParams =3D params->cParams; + zc->blockState.matchState.prefetchCDictTables =3D params->prefetch= CDictTables =3D=3D ZSTD_ps_enable; zc->pledgedSrcSizePlusOne =3D pledgedSrcSize+1; zc->consumedSrcSize =3D 0; zc->producedCSize =3D 0; @@ -1845,7 +2124,7 @@ static size_t ZSTD_resetCCtx_internal(ZSTD_CCtx* zc, zc->appliedParams.fParams.contentSizeFlag =3D 0; DEBUGLOG(4, "pledged content size : %u ; flag : %u", (unsigned)pledgedSrcSize, zc->appliedParams.fParams.contentSiz= eFlag); - zc->blockSize =3D blockSize; + zc->blockSizeMax =3D blockSize; =20 xxh64_reset(&zc->xxhState, 0); zc->stage =3D ZSTDcs_init; @@ -1854,13 +2133,46 @@ static size_t ZSTD_resetCCtx_internal(ZSTD_CCtx* zc, =20 ZSTD_reset_compressedBlockState(zc->blockState.prevCBlock); =20 + FORWARD_IF_ERROR(ZSTD_reset_matchState( + &zc->blockState.matchState, + ws, + ¶ms->cParams, + params->useRowMatchFinder, + crp, + needsIndexReset, + ZSTD_resetTarget_CCtx), ""); + + zc->seqStore.sequencesStart =3D (SeqDef*)ZSTD_cwksp_reserve_aligne= d64(ws, maxNbSeq * sizeof(SeqDef)); + + /* ldm hash table */ + if (params->ldmParams.enableLdm =3D=3D ZSTD_ps_enable) { + /* TODO: avoid memset? */ + size_t const ldmHSize =3D ((size_t)1) << params->ldmParams.has= hLog; + zc->ldmState.hashTable =3D (ldmEntry_t*)ZSTD_cwksp_reserve_ali= gned64(ws, ldmHSize * sizeof(ldmEntry_t)); + ZSTD_memset(zc->ldmState.hashTable, 0, ldmHSize * sizeof(ldmEn= try_t)); + zc->ldmSequences =3D (rawSeq*)ZSTD_cwksp_reserve_aligned64(ws,= maxNbLdmSeq * sizeof(rawSeq)); + zc->maxNbLdmSequences =3D maxNbLdmSeq; + + ZSTD_window_init(&zc->ldmState.window); + zc->ldmState.loadedDictEnd =3D 0; + } + + /* reserve space for block-level external sequences */ + if (ZSTD_hasExtSeqProd(params)) { + size_t const maxNbExternalSeq =3D ZSTD_sequenceBound(blockSize= ); + zc->extSeqBufCapacity =3D maxNbExternalSeq; + zc->extSeqBuf =3D + (ZSTD_Sequence*)ZSTD_cwksp_reserve_aligned64(ws, maxNbExte= rnalSeq * sizeof(ZSTD_Sequence)); + } + + /* buffers */ + /* ZSTD_wildcopy() is used to copy into the literals buffer, * so we have to oversize the buffer by WILDCOPY_OVERLENGTH bytes. */ zc->seqStore.litStart =3D ZSTD_cwksp_reserve_buffer(ws, blockSize = + WILDCOPY_OVERLENGTH); zc->seqStore.maxNbLit =3D blockSize; =20 - /* buffers */ zc->bufferedPolicy =3D zbuff; zc->inBuffSize =3D buffInSize; zc->inBuff =3D (char*)ZSTD_cwksp_reserve_buffer(ws, buffInSize); @@ -1883,32 +2195,9 @@ static size_t ZSTD_resetCCtx_internal(ZSTD_CCtx* zc, zc->seqStore.llCode =3D ZSTD_cwksp_reserve_buffer(ws, maxNbSeq * s= izeof(BYTE)); zc->seqStore.mlCode =3D ZSTD_cwksp_reserve_buffer(ws, maxNbSeq * s= izeof(BYTE)); zc->seqStore.ofCode =3D ZSTD_cwksp_reserve_buffer(ws, maxNbSeq * s= izeof(BYTE)); - zc->seqStore.sequencesStart =3D (seqDef*)ZSTD_cwksp_reserve_aligne= d(ws, maxNbSeq * sizeof(seqDef)); - - FORWARD_IF_ERROR(ZSTD_reset_matchState( - &zc->blockState.matchState, - ws, - ¶ms->cParams, - params->useRowMatchFinder, - crp, - needsIndexReset, - ZSTD_resetTarget_CCtx), ""); - - /* ldm hash table */ - if (params->ldmParams.enableLdm =3D=3D ZSTD_ps_enable) { - /* TODO: avoid memset? */ - size_t const ldmHSize =3D ((size_t)1) << params->ldmParams.has= hLog; - zc->ldmState.hashTable =3D (ldmEntry_t*)ZSTD_cwksp_reserve_ali= gned(ws, ldmHSize * sizeof(ldmEntry_t)); - ZSTD_memset(zc->ldmState.hashTable, 0, ldmHSize * sizeof(ldmEn= try_t)); - zc->ldmSequences =3D (rawSeq*)ZSTD_cwksp_reserve_aligned(ws, m= axNbLdmSeq * sizeof(rawSeq)); - zc->maxNbLdmSequences =3D maxNbLdmSeq; - - ZSTD_window_init(&zc->ldmState.window); - zc->ldmState.loadedDictEnd =3D 0; - } =20 DEBUGLOG(3, "wksp: finished allocating, %zd bytes remain available= ", ZSTD_cwksp_available_space(ws)); - assert(ZSTD_cwksp_estimated_space_within_bounds(ws, neededSpace, r= esizeWorkspace)); + assert(ZSTD_cwksp_estimated_space_within_bounds(ws, neededSpace)); =20 zc->initialized =3D 1; =20 @@ -1980,7 +2269,8 @@ ZSTD_resetCCtx_byAttachingCDict(ZSTD_CCtx* cctx, } =20 params.cParams =3D ZSTD_adjustCParams_internal(adjusted_cdict_cPar= ams, pledgedSrcSize, - cdict->dictContentSiz= e, ZSTD_cpm_attachDict); + cdict->dictContentSiz= e, ZSTD_cpm_attachDict, + params.useRowMatchFin= der); params.cParams.windowLog =3D windowLog; params.useRowMatchFinder =3D cdict->useRowMatchFinder; /* cdict= overrides */ FORWARD_IF_ERROR(ZSTD_resetCCtx_internal(cctx, ¶ms, pledgedSrc= Size, @@ -2019,6 +2309,22 @@ ZSTD_resetCCtx_byAttachingCDict(ZSTD_CCtx* cctx, return 0; } =20 +static void ZSTD_copyCDictTableIntoCCtx(U32* dst, U32 const* src, size_t t= ableSize, + ZSTD_compressionParameters const* = cParams) { + if (ZSTD_CDictIndicesAreTagged(cParams)){ + /* Remove tags from the CDict table if they are present. + * See docs on "short cache" in zstd_compress_internal.h for conte= xt. */ + size_t i; + for (i =3D 0; i < tableSize; i++) { + U32 const taggedIndex =3D src[i]; + U32 const index =3D taggedIndex >> ZSTD_SHORT_CACHE_TAG_BITS; + dst[i] =3D index; + } + } else { + ZSTD_memcpy(dst, src, tableSize * sizeof(U32)); + } +} + static size_t ZSTD_resetCCtx_byCopyingCDict(ZSTD_CCtx* cctx, const ZSTD_CDict* cdict, ZSTD_CCtx_params params, @@ -2054,26 +2360,29 @@ static size_t ZSTD_resetCCtx_byCopyingCDict(ZSTD_CC= tx* cctx, : 0; size_t const hSize =3D (size_t)1 << cdict_cParams->hashLog; =20 - ZSTD_memcpy(cctx->blockState.matchState.hashTable, - cdict->matchState.hashTable, - hSize * sizeof(U32)); + ZSTD_copyCDictTableIntoCCtx(cctx->blockState.matchState.hashTable, + cdict->matchState.hashTable, + hSize, cdict_cParams); + /* Do not copy cdict's chainTable if cctx has parameters such that= it would not use chainTable */ if (ZSTD_allocateChainTable(cctx->appliedParams.cParams.strategy, = cctx->appliedParams.useRowMatchFinder, 0 /* forDDSDict */)) { - ZSTD_memcpy(cctx->blockState.matchState.chainTable, - cdict->matchState.chainTable, - chainSize * sizeof(U32)); + ZSTD_copyCDictTableIntoCCtx(cctx->blockState.matchState.chainT= able, + cdict->matchState.chainTable, + chainSize, cdict_cParams); } /* copy tag table */ if (ZSTD_rowMatchFinderUsed(cdict_cParams->strategy, cdict->useRow= MatchFinder)) { - size_t const tagTableSize =3D hSize*sizeof(U16); + size_t const tagTableSize =3D hSize; ZSTD_memcpy(cctx->blockState.matchState.tagTable, - cdict->matchState.tagTable, - tagTableSize); + cdict->matchState.tagTable, + tagTableSize); + cctx->blockState.matchState.hashSalt =3D cdict->matchState.has= hSalt; } } =20 /* Zero the hashTable3, since the cdict never fills it */ - { int const h3log =3D cctx->blockState.matchState.hashLog3; + assert(cctx->blockState.matchState.hashLog3 <=3D 31); + { U32 const h3log =3D cctx->blockState.matchState.hashLog3; size_t const h3Size =3D h3log ? ((size_t)1 << h3log) : 0; assert(cdict->matchState.hashLog3 =3D=3D 0); ZSTD_memset(cctx->blockState.matchState.hashTable3, 0, h3Size * si= zeof(U32)); @@ -2082,8 +2391,8 @@ static size_t ZSTD_resetCCtx_byCopyingCDict(ZSTD_CCtx= * cctx, ZSTD_cwksp_mark_tables_clean(&cctx->workspace); =20 /* copy dictionary offsets */ - { ZSTD_matchState_t const* srcMatchState =3D &cdict->matchState; - ZSTD_matchState_t* dstMatchState =3D &cctx->blockState.matchState; + { ZSTD_MatchState_t const* srcMatchState =3D &cdict->matchState; + ZSTD_MatchState_t* dstMatchState =3D &cctx->blockState.matchState; dstMatchState->window =3D srcMatchState->window; dstMatchState->nextToUpdate =3D srcMatchState->nextToUpdate; dstMatchState->loadedDictEnd=3D srcMatchState->loadedDictEnd; @@ -2141,12 +2450,13 @@ static size_t ZSTD_copyCCtx_internal(ZSTD_CCtx* dst= CCtx, /* Copy only compression parameters related to tables. */ params.cParams =3D srcCCtx->appliedParams.cParams; assert(srcCCtx->appliedParams.useRowMatchFinder !=3D ZSTD_ps_auto); - assert(srcCCtx->appliedParams.useBlockSplitter !=3D ZSTD_ps_auto); + assert(srcCCtx->appliedParams.postBlockSplitter !=3D ZSTD_ps_auto); assert(srcCCtx->appliedParams.ldmParams.enableLdm !=3D ZSTD_ps_aut= o); params.useRowMatchFinder =3D srcCCtx->appliedParams.useRowMatchFin= der; - params.useBlockSplitter =3D srcCCtx->appliedParams.useBlockSplitte= r; + params.postBlockSplitter =3D srcCCtx->appliedParams.postBlockSplit= ter; params.ldmParams =3D srcCCtx->appliedParams.ldmParams; params.fParams =3D fParams; + params.maxBlockSize =3D srcCCtx->appliedParams.maxBlockSize; ZSTD_resetCCtx_internal(dstCCtx, ¶ms, pledgedSrcSize, /* loadedDictSize */ 0, ZSTDcrp_leaveDirty, zbuff); @@ -2166,7 +2476,7 @@ static size_t ZSTD_copyCCtx_internal(ZSTD_CCtx* dstCC= tx, ? ((size_t)1 << srcCCtx->appliedParams= .cParams.chainLog) : 0; size_t const hSize =3D (size_t)1 << srcCCtx->appliedParams.cParam= s.hashLog; - int const h3log =3D srcCCtx->blockState.matchState.hashLog3; + U32 const h3log =3D srcCCtx->blockState.matchState.hashLog3; size_t const h3Size =3D h3log ? ((size_t)1 << h3log) : 0; =20 ZSTD_memcpy(dstCCtx->blockState.matchState.hashTable, @@ -2184,8 +2494,8 @@ static size_t ZSTD_copyCCtx_internal(ZSTD_CCtx* dstCC= tx, =20 /* copy dictionary offsets */ { - const ZSTD_matchState_t* srcMatchState =3D &srcCCtx->blockState.ma= tchState; - ZSTD_matchState_t* dstMatchState =3D &dstCCtx->blockState.matchSta= te; + const ZSTD_MatchState_t* srcMatchState =3D &srcCCtx->blockState.ma= tchState; + ZSTD_MatchState_t* dstMatchState =3D &dstCCtx->blockState.matchSta= te; dstMatchState->window =3D srcMatchState->window; dstMatchState->nextToUpdate =3D srcMatchState->nextToUpdate; dstMatchState->loadedDictEnd=3D srcMatchState->loadedDictEnd; @@ -2234,7 +2544,7 @@ ZSTD_reduceTable_internal (U32* const table, U32 cons= t size, U32 const reducerVa /* Protect special index values < ZSTD_WINDOW_START_INDEX. */ U32 const reducerThreshold =3D reducerValue + ZSTD_WINDOW_START_INDEX; assert((size & (ZSTD_ROWSIZE-1)) =3D=3D 0); /* multiple of ZSTD_ROWSI= ZE */ - assert(size < (1U<<31)); /* can be casted to int */ + assert(size < (1U<<31)); /* can be cast to int */ =20 =20 for (rowNb=3D0 ; rowNb < nbRows ; rowNb++) { @@ -2267,7 +2577,7 @@ static void ZSTD_reduceTable_btlazy2(U32* const table= , U32 const size, U32 const =20 /*! ZSTD_reduceIndex() : * rescale all indexes to avoid future overflow (indexes are U32) */ -static void ZSTD_reduceIndex (ZSTD_matchState_t* ms, ZSTD_CCtx_params cons= t* params, const U32 reducerValue) +static void ZSTD_reduceIndex (ZSTD_MatchState_t* ms, ZSTD_CCtx_params cons= t* params, const U32 reducerValue) { { U32 const hSize =3D (U32)1 << params->cParams.hashLog; ZSTD_reduceTable(ms->hashTable, hSize, reducerValue); @@ -2294,26 +2604,32 @@ static void ZSTD_reduceIndex (ZSTD_matchState_t* ms= , ZSTD_CCtx_params const* par =20 /* See doc/zstd_compression_format.md for detailed format description */ =20 -void ZSTD_seqToCodes(const seqStore_t* seqStorePtr) +int ZSTD_seqToCodes(const SeqStore_t* seqStorePtr) { - const seqDef* const sequences =3D seqStorePtr->sequencesStart; + const SeqDef* const sequences =3D seqStorePtr->sequencesStart; BYTE* const llCodeTable =3D seqStorePtr->llCode; BYTE* const ofCodeTable =3D seqStorePtr->ofCode; BYTE* const mlCodeTable =3D seqStorePtr->mlCode; U32 const nbSeq =3D (U32)(seqStorePtr->sequences - seqStorePtr->sequen= cesStart); U32 u; + int longOffsets =3D 0; assert(nbSeq <=3D seqStorePtr->maxNbSeq); for (u=3D0; u=3D STREAM_ACCUMULATOR_MIN)); + if (MEM_32bits() && ofCode >=3D STREAM_ACCUMULATOR_MIN) + longOffsets =3D 1; } if (seqStorePtr->longLengthType=3D=3DZSTD_llt_literalLength) llCodeTable[seqStorePtr->longLengthPos] =3D MaxLL; if (seqStorePtr->longLengthType=3D=3DZSTD_llt_matchLength) mlCodeTable[seqStorePtr->longLengthPos] =3D MaxML; + return longOffsets; } =20 /* ZSTD_useTargetCBlockSize(): @@ -2333,9 +2649,9 @@ static int ZSTD_useTargetCBlockSize(const ZSTD_CCtx_p= arams* cctxParams) * Returns 1 if true, 0 otherwise. */ static int ZSTD_blockSplitterEnabled(ZSTD_CCtx_params* cctxParams) { - DEBUGLOG(5, "ZSTD_blockSplitterEnabled (useBlockSplitter=3D%d)", cctxP= arams->useBlockSplitter); - assert(cctxParams->useBlockSplitter !=3D ZSTD_ps_auto); - return (cctxParams->useBlockSplitter =3D=3D ZSTD_ps_enable); + DEBUGLOG(5, "ZSTD_blockSplitterEnabled (postBlockSplitter=3D%d)", cctx= Params->postBlockSplitter); + assert(cctxParams->postBlockSplitter !=3D ZSTD_ps_auto); + return (cctxParams->postBlockSplitter =3D=3D ZSTD_ps_enable); } =20 /* Type returned by ZSTD_buildSequencesStatistics containing finalized sym= bol encoding types @@ -2347,6 +2663,7 @@ typedef struct { U32 MLtype; size_t size; size_t lastCountSize; /* Accounts for bug in 1.3.4. More detail in ZST= D_entropyCompressSeqStore_internal() */ + int longOffsets; } ZSTD_symbolEncodingTypeStats_t; =20 /* ZSTD_buildSequencesStatistics(): @@ -2357,11 +2674,13 @@ typedef struct { * entropyWkspSize must be of size at least ENTROPY_WORKSPACE_SIZE - (MaxS= eq + 1)*sizeof(U32) */ static ZSTD_symbolEncodingTypeStats_t -ZSTD_buildSequencesStatistics(seqStore_t* seqStorePtr, size_t nbSeq, - const ZSTD_fseCTables_t* prevEntropy, ZSTD_fseCTab= les_t* nextEntropy, - BYTE* dst, const BYTE* const dstEnd, - ZSTD_strategy strategy, unsigned* countWorks= pace, - void* entropyWorkspace, size_t entropyWkspSi= ze) { +ZSTD_buildSequencesStatistics( + const SeqStore_t* seqStorePtr, size_t nbSeq, + const ZSTD_fseCTables_t* prevEntropy, ZSTD_fseCTables_t* n= extEntropy, + BYTE* dst, const BYTE* const dstEnd, + ZSTD_strategy strategy, unsigned* countWorkspace, + void* entropyWorkspace, size_t entropyWkspSize) +{ BYTE* const ostart =3D dst; const BYTE* const oend =3D dstEnd; BYTE* op =3D ostart; @@ -2375,7 +2694,7 @@ ZSTD_buildSequencesStatistics(seqStore_t* seqStorePtr= , size_t nbSeq, =20 stats.lastCountSize =3D 0; /* convert length/distances into codes */ - ZSTD_seqToCodes(seqStorePtr); + stats.longOffsets =3D ZSTD_seqToCodes(seqStorePtr); assert(op <=3D oend); assert(nbSeq !=3D 0); /* ZSTD_selectEncodingType() divides by nbSeq */ /* build CTable for Literal Lengths */ @@ -2392,7 +2711,7 @@ ZSTD_buildSequencesStatistics(seqStore_t* seqStorePtr= , size_t nbSeq, assert(!(stats.LLtype < set_compressed && nextEntropy->litlength_r= epeatMode !=3D FSE_repeat_none)); /* We don't copy tables */ { size_t const countSize =3D ZSTD_buildCTable( op, (size_t)(oend - op), - CTable_LitLength, LLFSELog, (symbolEncodingType_e)stats.LL= type, + CTable_LitLength, LLFSELog, (SymbolEncodingType_e)stats.LL= type, countWorkspace, max, llCodeTable, nbSeq, LL_defaultNorm, LL_defaultNormLog, MaxLL, prevEntropy->litlengthCTable, @@ -2413,7 +2732,7 @@ ZSTD_buildSequencesStatistics(seqStore_t* seqStorePtr= , size_t nbSeq, size_t const mostFrequent =3D HIST_countFast_wksp( countWorkspace, &max, ofCodeTable, nbSeq, entropyWorkspace, en= tropyWkspSize); /* can't fail */ /* We can only use the basic table if max <=3D DefaultMaxOff, othe= rwise the offsets are too large */ - ZSTD_defaultPolicy_e const defaultPolicy =3D (max <=3D DefaultMaxO= ff) ? ZSTD_defaultAllowed : ZSTD_defaultDisallowed; + ZSTD_DefaultPolicy_e const defaultPolicy =3D (max <=3D DefaultMaxO= ff) ? ZSTD_defaultAllowed : ZSTD_defaultDisallowed; DEBUGLOG(5, "Building OF table"); nextEntropy->offcode_repeatMode =3D prevEntropy->offcode_repeatMod= e; stats.Offtype =3D ZSTD_selectEncodingType(&nextEntropy->offcode_re= peatMode, @@ -2424,7 +2743,7 @@ ZSTD_buildSequencesStatistics(seqStore_t* seqStorePtr= , size_t nbSeq, assert(!(stats.Offtype < set_compressed && nextEntropy->offcode_re= peatMode !=3D FSE_repeat_none)); /* We don't copy tables */ { size_t const countSize =3D ZSTD_buildCTable( op, (size_t)(oend - op), - CTable_OffsetBits, OffFSELog, (symbolEncodingType_e)stats.= Offtype, + CTable_OffsetBits, OffFSELog, (SymbolEncodingType_e)stats.= Offtype, countWorkspace, max, ofCodeTable, nbSeq, OF_defaultNorm, OF_defaultNormLog, DefaultMaxOff, prevEntropy->offcodeCTable, @@ -2454,7 +2773,7 @@ ZSTD_buildSequencesStatistics(seqStore_t* seqStorePtr= , size_t nbSeq, assert(!(stats.MLtype < set_compressed && nextEntropy->matchlength= _repeatMode !=3D FSE_repeat_none)); /* We don't copy tables */ { size_t const countSize =3D ZSTD_buildCTable( op, (size_t)(oend - op), - CTable_MatchLength, MLFSELog, (symbolEncodingType_e)stats.= MLtype, + CTable_MatchLength, MLFSELog, (SymbolEncodingType_e)stats.= MLtype, countWorkspace, max, mlCodeTable, nbSeq, ML_defaultNorm, ML_defaultNormLog, MaxML, prevEntropy->matchlengthCTable, @@ -2480,22 +2799,23 @@ ZSTD_buildSequencesStatistics(seqStore_t* seqStoreP= tr, size_t nbSeq, */ #define SUSPECT_UNCOMPRESSIBLE_LITERAL_RATIO 20 MEM_STATIC size_t -ZSTD_entropyCompressSeqStore_internal(seqStore_t* seqStorePtr, - const ZSTD_entropyCTables_t* prevEntropy, - ZSTD_entropyCTables_t* nextEntropy, - const ZSTD_CCtx_params* cctxParams, - void* dst, size_t dstCapacity, - void* entropyWorkspace, size_t entropyWksp= Size, - const int bmi2) +ZSTD_entropyCompressSeqStore_internal( + void* dst, size_t dstCapacity, + const void* literals, size_t litSize, + const SeqStore_t* seqStorePtr, + const ZSTD_entropyCTables_t* prevEntropy, + ZSTD_entropyCTables_t* nextEntropy, + const ZSTD_CCtx_params* cctxParams, + void* entropyWorkspace, size_t entropyWkspSi= ze, + const int bmi2) { - const int longOffsets =3D cctxParams->cParams.windowLog > STREAM_ACCUM= ULATOR_MIN; ZSTD_strategy const strategy =3D cctxParams->cParams.strategy; unsigned* count =3D (unsigned*)entropyWorkspace; FSE_CTable* CTable_LitLength =3D nextEntropy->fse.litlengthCTable; FSE_CTable* CTable_OffsetBits =3D nextEntropy->fse.offcodeCTable; FSE_CTable* CTable_MatchLength =3D nextEntropy->fse.matchlengthCTable; - const seqDef* const sequences =3D seqStorePtr->sequencesStart; - const size_t nbSeq =3D seqStorePtr->sequences - seqStorePtr->sequences= Start; + const SeqDef* const sequences =3D seqStorePtr->sequencesStart; + const size_t nbSeq =3D (size_t)(seqStorePtr->sequences - seqStorePtr->= sequencesStart); const BYTE* const ofCodeTable =3D seqStorePtr->ofCode; const BYTE* const llCodeTable =3D seqStorePtr->llCode; const BYTE* const mlCodeTable =3D seqStorePtr->mlCode; @@ -2503,29 +2823,28 @@ ZSTD_entropyCompressSeqStore_internal(seqStore_t* s= eqStorePtr, BYTE* const oend =3D ostart + dstCapacity; BYTE* op =3D ostart; size_t lastCountSize; + int longOffsets =3D 0; =20 entropyWorkspace =3D count + (MaxSeq + 1); entropyWkspSize -=3D (MaxSeq + 1) * sizeof(*count); =20 - DEBUGLOG(4, "ZSTD_entropyCompressSeqStore_internal (nbSeq=3D%zu)", nbS= eq); + DEBUGLOG(5, "ZSTD_entropyCompressSeqStore_internal (nbSeq=3D%zu, dstCa= pacity=3D%zu)", nbSeq, dstCapacity); ZSTD_STATIC_ASSERT(HUF_WORKSPACE_SIZE >=3D (1<=3D HUF_WORKSPACE_SIZE); =20 /* Compress literals */ - { const BYTE* const literals =3D seqStorePtr->litStart; - size_t const numSequences =3D seqStorePtr->sequences - seqStorePtr= ->sequencesStart; - size_t const numLiterals =3D seqStorePtr->lit - seqStorePtr->litSt= art; + { size_t const numSequences =3D (size_t)(seqStorePtr->sequences - se= qStorePtr->sequencesStart); /* Base suspicion of uncompressibility on ratio of literals to seq= uences */ - unsigned const suspectUncompressible =3D (numSequences =3D=3D 0) |= | (numLiterals / numSequences >=3D SUSPECT_UNCOMPRESSIBLE_LITERAL_RATIO); - size_t const litSize =3D (size_t)(seqStorePtr->lit - literals); + int const suspectUncompressible =3D (numSequences =3D=3D 0) || (li= tSize / numSequences >=3D SUSPECT_UNCOMPRESSIBLE_LITERAL_RATIO); + size_t const cSize =3D ZSTD_compressLiterals( - &prevEntropy->huf, &nextEntropy->huf, - cctxParams->cParams.strategy, - ZSTD_literalsCompressionIsDisabled(cct= xParams), op, dstCapacity, literals, litSize, entropyWorkspace, entropyWkspSize, - bmi2, suspectUncompressible); + &prevEntropy->huf, &nextEntropy->huf, + cctxParams->cParams.strategy, + ZSTD_literalsCompressionIsDisabled(cct= xParams), + suspectUncompressible, bmi2); FORWARD_IF_ERROR(cSize, "ZSTD_compressLiterals failed"); assert(cSize <=3D dstCapacity); op +=3D cSize; @@ -2551,11 +2870,10 @@ ZSTD_entropyCompressSeqStore_internal(seqStore_t* s= eqStorePtr, ZSTD_memcpy(&nextEntropy->fse, &prevEntropy->fse, sizeof(prevEntro= py->fse)); return (size_t)(op - ostart); } - { - ZSTD_symbolEncodingTypeStats_t stats; - BYTE* seqHead =3D op++; + { BYTE* const seqHead =3D op++; /* build stats for sequences */ - stats =3D ZSTD_buildSequencesStatistics(seqStorePtr, nbSeq, + const ZSTD_symbolEncodingTypeStats_t stats =3D + ZSTD_buildSequencesStatistics(seqStorePtr, nbSeq, &prevEntropy->fse, &nextEntro= py->fse, op, oend, strategy, count, @@ -2564,6 +2882,7 @@ ZSTD_entropyCompressSeqStore_internal(seqStore_t* seq= StorePtr, *seqHead =3D (BYTE)((stats.LLtype<<6) + (stats.Offtype<<4) + (stat= s.MLtype<<2)); lastCountSize =3D stats.lastCountSize; op +=3D stats.size; + longOffsets =3D stats.longOffsets; } =20 { size_t const bitstreamSize =3D ZSTD_encodeSequences( @@ -2597,104 +2916,146 @@ ZSTD_entropyCompressSeqStore_internal(seqStore_t*= seqStorePtr, return (size_t)(op - ostart); } =20 -MEM_STATIC size_t -ZSTD_entropyCompressSeqStore(seqStore_t* seqStorePtr, - const ZSTD_entropyCTables_t* prevEntropy, - ZSTD_entropyCTables_t* nextEntropy, - const ZSTD_CCtx_params* cctxParams, - void* dst, size_t dstCapacity, - size_t srcSize, - void* entropyWorkspace, size_t entropyWkspSiz= e, - int bmi2) +static size_t +ZSTD_entropyCompressSeqStore_wExtLitBuffer( + void* dst, size_t dstCapacity, + const void* literals, size_t litSize, + size_t blockSize, + const SeqStore_t* seqStorePtr, + const ZSTD_entropyCTables_t* prevEntropy, + ZSTD_entropyCTables_t* nextEntropy, + const ZSTD_CCtx_params* cctxParams, + void* entropyWorkspace, size_t entropyWkspSize, + int bmi2) { size_t const cSize =3D ZSTD_entropyCompressSeqStore_internal( - seqStorePtr, prevEntropy, nextEntropy, cctxPar= ams, dst, dstCapacity, + literals, litSize, + seqStorePtr, prevEntropy, nextEntropy, cctxPar= ams, entropyWorkspace, entropyWkspSize, bmi2); if (cSize =3D=3D 0) return 0; /* When srcSize <=3D dstCapacity, there is enough space to write a raw= uncompressed block. * Since we ran out of space, block must be not compressible, so fall = back to raw uncompressed block. */ - if ((cSize =3D=3D ERROR(dstSize_tooSmall)) & (srcSize <=3D dstCapacity= )) + if ((cSize =3D=3D ERROR(dstSize_tooSmall)) & (blockSize <=3D dstCapaci= ty)) { + DEBUGLOG(4, "not enough dstCapacity (%zu) for ZSTD_entropyCompress= SeqStore_internal()=3D> do not compress block", dstCapacity); return 0; /* block not compressed */ + } FORWARD_IF_ERROR(cSize, "ZSTD_entropyCompressSeqStore_internal failed"= ); =20 /* Check compressibility */ - { size_t const maxCSize =3D srcSize - ZSTD_minGain(srcSize, cctxPara= ms->cParams.strategy); + { size_t const maxCSize =3D blockSize - ZSTD_minGain(blockSize, cctx= Params->cParams.strategy); if (cSize >=3D maxCSize) return 0; /* block not compressed */ } - DEBUGLOG(4, "ZSTD_entropyCompressSeqStore() cSize: %zu", cSize); + DEBUGLOG(5, "ZSTD_entropyCompressSeqStore() cSize: %zu", cSize); + /* libzstd decoder before > v1.5.4 is not compatible with compressed = blocks of size ZSTD_BLOCKSIZE_MAX exactly. + * This restriction is indirectly already fulfilled by respecting ZSTD= _minGain() condition above. + */ + assert(cSize < ZSTD_BLOCKSIZE_MAX); return cSize; } =20 +static size_t +ZSTD_entropyCompressSeqStore( + const SeqStore_t* seqStorePtr, + const ZSTD_entropyCTables_t* prevEntropy, + ZSTD_entropyCTables_t* nextEntropy, + const ZSTD_CCtx_params* cctxParams, + void* dst, size_t dstCapacity, + size_t srcSize, + void* entropyWorkspace, size_t entropyWkspSize, + int bmi2) +{ + return ZSTD_entropyCompressSeqStore_wExtLitBuffer( + dst, dstCapacity, + seqStorePtr->litStart, (size_t)(seqStorePtr->lit - seqStor= ePtr->litStart), + srcSize, + seqStorePtr, + prevEntropy, nextEntropy, + cctxParams, + entropyWorkspace, entropyWkspSize, + bmi2); +} + /* ZSTD_selectBlockCompressor() : * Not static, but internal use only (used by long distance matcher) * assumption : strat is a valid strategy */ -ZSTD_blockCompressor ZSTD_selectBlockCompressor(ZSTD_strategy strat, ZSTD_= paramSwitch_e useRowMatchFinder, ZSTD_dictMode_e dictMode) +ZSTD_BlockCompressor_f ZSTD_selectBlockCompressor(ZSTD_strategy strat, ZST= D_ParamSwitch_e useRowMatchFinder, ZSTD_dictMode_e dictMode) { - static const ZSTD_blockCompressor blockCompressor[4][ZSTD_STRATEGY_MAX= +1] =3D { + static const ZSTD_BlockCompressor_f blockCompressor[4][ZSTD_STRATEGY_M= AX+1] =3D { { ZSTD_compressBlock_fast /* default for 0 */, ZSTD_compressBlock_fast, - ZSTD_compressBlock_doubleFast, - ZSTD_compressBlock_greedy, - ZSTD_compressBlock_lazy, - ZSTD_compressBlock_lazy2, - ZSTD_compressBlock_btlazy2, - ZSTD_compressBlock_btopt, - ZSTD_compressBlock_btultra, - ZSTD_compressBlock_btultra2 }, + ZSTD_COMPRESSBLOCK_DOUBLEFAST, + ZSTD_COMPRESSBLOCK_GREEDY, + ZSTD_COMPRESSBLOCK_LAZY, + ZSTD_COMPRESSBLOCK_LAZY2, + ZSTD_COMPRESSBLOCK_BTLAZY2, + ZSTD_COMPRESSBLOCK_BTOPT, + ZSTD_COMPRESSBLOCK_BTULTRA, + ZSTD_COMPRESSBLOCK_BTULTRA2 + }, { ZSTD_compressBlock_fast_extDict /* default for 0 */, ZSTD_compressBlock_fast_extDict, - ZSTD_compressBlock_doubleFast_extDict, - ZSTD_compressBlock_greedy_extDict, - ZSTD_compressBlock_lazy_extDict, - ZSTD_compressBlock_lazy2_extDict, - ZSTD_compressBlock_btlazy2_extDict, - ZSTD_compressBlock_btopt_extDict, - ZSTD_compressBlock_btultra_extDict, - ZSTD_compressBlock_btultra_extDict }, + ZSTD_COMPRESSBLOCK_DOUBLEFAST_EXTDICT, + ZSTD_COMPRESSBLOCK_GREEDY_EXTDICT, + ZSTD_COMPRESSBLOCK_LAZY_EXTDICT, + ZSTD_COMPRESSBLOCK_LAZY2_EXTDICT, + ZSTD_COMPRESSBLOCK_BTLAZY2_EXTDICT, + ZSTD_COMPRESSBLOCK_BTOPT_EXTDICT, + ZSTD_COMPRESSBLOCK_BTULTRA_EXTDICT, + ZSTD_COMPRESSBLOCK_BTULTRA_EXTDICT + }, { ZSTD_compressBlock_fast_dictMatchState /* default for 0 */, ZSTD_compressBlock_fast_dictMatchState, - ZSTD_compressBlock_doubleFast_dictMatchState, - ZSTD_compressBlock_greedy_dictMatchState, - ZSTD_compressBlock_lazy_dictMatchState, - ZSTD_compressBlock_lazy2_dictMatchState, - ZSTD_compressBlock_btlazy2_dictMatchState, - ZSTD_compressBlock_btopt_dictMatchState, - ZSTD_compressBlock_btultra_dictMatchState, - ZSTD_compressBlock_btultra_dictMatchState }, + ZSTD_COMPRESSBLOCK_DOUBLEFAST_DICTMATCHSTATE, + ZSTD_COMPRESSBLOCK_GREEDY_DICTMATCHSTATE, + ZSTD_COMPRESSBLOCK_LAZY_DICTMATCHSTATE, + ZSTD_COMPRESSBLOCK_LAZY2_DICTMATCHSTATE, + ZSTD_COMPRESSBLOCK_BTLAZY2_DICTMATCHSTATE, + ZSTD_COMPRESSBLOCK_BTOPT_DICTMATCHSTATE, + ZSTD_COMPRESSBLOCK_BTULTRA_DICTMATCHSTATE, + ZSTD_COMPRESSBLOCK_BTULTRA_DICTMATCHSTATE + }, { NULL /* default for 0 */, NULL, NULL, - ZSTD_compressBlock_greedy_dedicatedDictSearch, - ZSTD_compressBlock_lazy_dedicatedDictSearch, - ZSTD_compressBlock_lazy2_dedicatedDictSearch, + ZSTD_COMPRESSBLOCK_GREEDY_DEDICATEDDICTSEARCH, + ZSTD_COMPRESSBLOCK_LAZY_DEDICATEDDICTSEARCH, + ZSTD_COMPRESSBLOCK_LAZY2_DEDICATEDDICTSEARCH, NULL, NULL, NULL, NULL } }; - ZSTD_blockCompressor selectedCompressor; + ZSTD_BlockCompressor_f selectedCompressor; ZSTD_STATIC_ASSERT((unsigned)ZSTD_fast =3D=3D 1); =20 - assert(ZSTD_cParam_withinBounds(ZSTD_c_strategy, strat)); - DEBUGLOG(4, "Selected block compressor: dictMode=3D%d strat=3D%d rowMa= tchfinder=3D%d", (int)dictMode, (int)strat, (int)useRowMatchFinder); + assert(ZSTD_cParam_withinBounds(ZSTD_c_strategy, (int)strat)); + DEBUGLOG(5, "Selected block compressor: dictMode=3D%d strat=3D%d rowMa= tchfinder=3D%d", (int)dictMode, (int)strat, (int)useRowMatchFinder); if (ZSTD_rowMatchFinderUsed(strat, useRowMatchFinder)) { - static const ZSTD_blockCompressor rowBasedBlockCompressors[4][3] = =3D { - { ZSTD_compressBlock_greedy_row, - ZSTD_compressBlock_lazy_row, - ZSTD_compressBlock_lazy2_row }, - { ZSTD_compressBlock_greedy_extDict_row, - ZSTD_compressBlock_lazy_extDict_row, - ZSTD_compressBlock_lazy2_extDict_row }, - { ZSTD_compressBlock_greedy_dictMatchState_row, - ZSTD_compressBlock_lazy_dictMatchState_row, - ZSTD_compressBlock_lazy2_dictMatchState_row }, - { ZSTD_compressBlock_greedy_dedicatedDictSearch_row, - ZSTD_compressBlock_lazy_dedicatedDictSearch_row, - ZSTD_compressBlock_lazy2_dedicatedDictSearch_row } + static const ZSTD_BlockCompressor_f rowBasedBlockCompressors[4][3]= =3D { + { + ZSTD_COMPRESSBLOCK_GREEDY_ROW, + ZSTD_COMPRESSBLOCK_LAZY_ROW, + ZSTD_COMPRESSBLOCK_LAZY2_ROW + }, + { + ZSTD_COMPRESSBLOCK_GREEDY_EXTDICT_ROW, + ZSTD_COMPRESSBLOCK_LAZY_EXTDICT_ROW, + ZSTD_COMPRESSBLOCK_LAZY2_EXTDICT_ROW + }, + { + ZSTD_COMPRESSBLOCK_GREEDY_DICTMATCHSTATE_ROW, + ZSTD_COMPRESSBLOCK_LAZY_DICTMATCHSTATE_ROW, + ZSTD_COMPRESSBLOCK_LAZY2_DICTMATCHSTATE_ROW + }, + { + ZSTD_COMPRESSBLOCK_GREEDY_DEDICATEDDICTSEARCH_ROW, + ZSTD_COMPRESSBLOCK_LAZY_DEDICATEDDICTSEARCH_ROW, + ZSTD_COMPRESSBLOCK_LAZY2_DEDICATEDDICTSEARCH_ROW + } }; - DEBUGLOG(4, "Selecting a row-based matchfinder"); + DEBUGLOG(5, "Selecting a row-based matchfinder"); assert(useRowMatchFinder !=3D ZSTD_ps_auto); selectedCompressor =3D rowBasedBlockCompressors[(int)dictMode][(in= t)strat - (int)ZSTD_greedy]; } else { @@ -2704,30 +3065,126 @@ ZSTD_blockCompressor ZSTD_selectBlockCompressor(ZS= TD_strategy strat, ZSTD_paramS return selectedCompressor; } =20 -static void ZSTD_storeLastLiterals(seqStore_t* seqStorePtr, +static void ZSTD_storeLastLiterals(SeqStore_t* seqStorePtr, const BYTE* anchor, size_t lastLLSize) { ZSTD_memcpy(seqStorePtr->lit, anchor, lastLLSize); seqStorePtr->lit +=3D lastLLSize; } =20 -void ZSTD_resetSeqStore(seqStore_t* ssPtr) +void ZSTD_resetSeqStore(SeqStore_t* ssPtr) { ssPtr->lit =3D ssPtr->litStart; ssPtr->sequences =3D ssPtr->sequencesStart; ssPtr->longLengthType =3D ZSTD_llt_none; } =20 -typedef enum { ZSTDbss_compress, ZSTDbss_noCompress } ZSTD_buildSeqStore_e; +/* ZSTD_postProcessSequenceProducerResult() : + * Validates and post-processes sequences obtained through the external ma= tchfinder API: + * - Checks whether nbExternalSeqs represents an error condition. + * - Appends a block delimiter to outSeqs if one is not already present. + * See zstd.h for context regarding block delimiters. + * Returns the number of sequences after post-processing, or an error code= . */ +static size_t ZSTD_postProcessSequenceProducerResult( + ZSTD_Sequence* outSeqs, size_t nbExternalSeqs, size_t outSeqsCapacity,= size_t srcSize +) { + RETURN_ERROR_IF( + nbExternalSeqs > outSeqsCapacity, + sequenceProducer_failed, + "External sequence producer returned error code %lu", + (unsigned long)nbExternalSeqs + ); + + RETURN_ERROR_IF( + nbExternalSeqs =3D=3D 0 && srcSize > 0, + sequenceProducer_failed, + "Got zero sequences from external sequence producer for a non-empt= y src buffer!" + ); + + if (srcSize =3D=3D 0) { + ZSTD_memset(&outSeqs[0], 0, sizeof(ZSTD_Sequence)); + return 1; + } + + { + ZSTD_Sequence const lastSeq =3D outSeqs[nbExternalSeqs - 1]; + + /* We can return early if lastSeq is already a block delimiter. */ + if (lastSeq.offset =3D=3D 0 && lastSeq.matchLength =3D=3D 0) { + return nbExternalSeqs; + } + + /* This error condition is only possible if the external matchfind= er + * produced an invalid parse, by definition of ZSTD_sequenceBound(= ). */ + RETURN_ERROR_IF( + nbExternalSeqs =3D=3D outSeqsCapacity, + sequenceProducer_failed, + "nbExternalSeqs =3D=3D outSeqsCapacity but lastSeq is not a bl= ock delimiter!" + ); + + /* lastSeq is not a block delimiter, so we need to append one. */ + ZSTD_memset(&outSeqs[nbExternalSeqs], 0, sizeof(ZSTD_Sequence)); + return nbExternalSeqs + 1; + } +} + +/* ZSTD_fastSequenceLengthSum() : + * Returns sum(litLen) + sum(matchLen) + lastLits for *seqBuf*. + * Similar to another function in zstd_compress.c (determine_blockSize), + * except it doesn't check for a block delimiter to end summation. + * Removing the early exit allows the compiler to auto-vectorize (https://= godbolt.org/z/cY1cajz9P). + * This function can be deleted and replaced by determine_blockSize after = we resolve issue #3456. */ +static size_t ZSTD_fastSequenceLengthSum(ZSTD_Sequence const* seqBuf, size= _t seqBufSize) { + size_t matchLenSum, litLenSum, i; + matchLenSum =3D 0; + litLenSum =3D 0; + for (i =3D 0; i < seqBufSize; i++) { + litLenSum +=3D seqBuf[i].litLength; + matchLenSum +=3D seqBuf[i].matchLength; + } + return litLenSum + matchLenSum; +} + +/* + * Function to validate sequences produced by a block compressor. + */ +static void ZSTD_validateSeqStore(const SeqStore_t* seqStore, const ZSTD_c= ompressionParameters* cParams) +{ +#if DEBUGLEVEL >=3D 1 + const SeqDef* seq =3D seqStore->sequencesStart; + const SeqDef* const seqEnd =3D seqStore->sequences; + size_t const matchLenLowerBound =3D cParams->minMatch =3D=3D 3 ? 3 : 4; + for (; seq < seqEnd; ++seq) { + const ZSTD_SequenceLength seqLength =3D ZSTD_getSequenceLength(seq= Store, seq); + assert(seqLength.matchLength >=3D matchLenLowerBound); + (void)seqLength; + (void)matchLenLowerBound; + } +#else + (void)seqStore; + (void)cParams; +#endif +} + +static size_t +ZSTD_transferSequences_wBlockDelim(ZSTD_CCtx* cctx, + ZSTD_SequencePosition* seqPos, + const ZSTD_Sequence* const inSeqs, size_t inS= eqsSize, + const void* src, size_t blockSize, + ZSTD_ParamSwitch_e externalRepSearch); + +typedef enum { ZSTDbss_compress, ZSTDbss_noCompress } ZSTD_BuildSeqStore_e; =20 static size_t ZSTD_buildSeqStore(ZSTD_CCtx* zc, const void* src, size_t sr= cSize) { - ZSTD_matchState_t* const ms =3D &zc->blockState.matchState; + ZSTD_MatchState_t* const ms =3D &zc->blockState.matchState; DEBUGLOG(5, "ZSTD_buildSeqStore (srcSize=3D%zu)", srcSize); assert(srcSize <=3D ZSTD_BLOCKSIZE_MAX); /* Assert that we have correctly flushed the ctx params into the ms's = copy */ ZSTD_assertEqualCParams(zc->appliedParams.cParams, ms->cParams); - if (srcSize < MIN_CBLOCK_SIZE+ZSTD_blockHeaderSize+1) { + /* TODO: See 3090. We reduced MIN_CBLOCK_SIZE from 3 to 2 so to compen= sate we are adding + * additional 1. We need to revisit and change this logic to be more c= onsistent */ + if (srcSize < MIN_CBLOCK_SIZE+ZSTD_blockHeaderSize+1+1) { if (zc->appliedParams.cParams.strategy >=3D ZSTD_btopt) { ZSTD_ldm_skipRawSeqStoreBytes(&zc->externSeqStore, srcSize); } else { @@ -2763,6 +3220,15 @@ static size_t ZSTD_buildSeqStore(ZSTD_CCtx* zc, cons= t void* src, size_t srcSize) } if (zc->externSeqStore.pos < zc->externSeqStore.size) { assert(zc->appliedParams.ldmParams.enableLdm =3D=3D ZSTD_ps_di= sable); + + /* External matchfinder + LDM is technically possible, just no= t implemented yet. + * We need to revisit soon and implement it. */ + RETURN_ERROR_IF( + ZSTD_hasExtSeqProd(&zc->appliedParams), + parameter_combination_unsupported, + "Long-distance matching with external sequence producer en= abled is not currently supported." + ); + /* Updates ldmSeqStore.pos */ lastLLSize =3D ZSTD_ldm_blockCompress(&zc->externSeqStore, @@ -2772,7 +3238,15 @@ static size_t ZSTD_buildSeqStore(ZSTD_CCtx* zc, cons= t void* src, size_t srcSize) src, srcSize); assert(zc->externSeqStore.pos <=3D zc->externSeqStore.size); } else if (zc->appliedParams.ldmParams.enableLdm =3D=3D ZSTD_ps_en= able) { - rawSeqStore_t ldmSeqStore =3D kNullRawSeqStore; + RawSeqStore_t ldmSeqStore =3D kNullRawSeqStore; + + /* External matchfinder + LDM is technically possible, just no= t implemented yet. + * We need to revisit soon and implement it. */ + RETURN_ERROR_IF( + ZSTD_hasExtSeqProd(&zc->appliedParams), + parameter_combination_unsupported, + "Long-distance matching with external sequence producer en= abled is not currently supported." + ); =20 ldmSeqStore.seq =3D zc->ldmSequences; ldmSeqStore.capacity =3D zc->maxNbLdmSequences; @@ -2788,42 +3262,116 @@ static size_t ZSTD_buildSeqStore(ZSTD_CCtx* zc, co= nst void* src, size_t srcSize) zc->appliedParams.useRowMatchFinder, src, srcSize); assert(ldmSeqStore.pos =3D=3D ldmSeqStore.size); - } else { /* not long range mode */ - ZSTD_blockCompressor const blockCompressor =3D ZSTD_selectBloc= kCompressor(zc->appliedParams.cParams.strategy, - = zc->appliedParams.useRowMatchFinder, - = dictMode); + } else if (ZSTD_hasExtSeqProd(&zc->appliedParams)) { + assert( + zc->extSeqBufCapacity >=3D ZSTD_sequenceBound(srcSize) + ); + assert(zc->appliedParams.extSeqProdFunc !=3D NULL); + + { U32 const windowSize =3D (U32)1 << zc->appliedParams.cPara= ms.windowLog; + + size_t const nbExternalSeqs =3D (zc->appliedParams.extSeqP= rodFunc)( + zc->appliedParams.extSeqProdState, + zc->extSeqBuf, + zc->extSeqBufCapacity, + src, srcSize, + NULL, 0, /* dict and dictSize, currently not supporte= d */ + zc->appliedParams.compressionLevel, + windowSize + ); + + size_t const nbPostProcessedSeqs =3D ZSTD_postProcessSeque= nceProducerResult( + zc->extSeqBuf, + nbExternalSeqs, + zc->extSeqBufCapacity, + srcSize + ); + + /* Return early if there is no error, since we don't need = to worry about last literals */ + if (!ZSTD_isError(nbPostProcessedSeqs)) { + ZSTD_SequencePosition seqPos =3D {0,0,0}; + size_t const seqLenSum =3D ZSTD_fastSequenceLengthSum(= zc->extSeqBuf, nbPostProcessedSeqs); + RETURN_ERROR_IF(seqLenSum > srcSize, externalSequences= _invalid, "External sequences imply too large a block!"); + FORWARD_IF_ERROR( + ZSTD_transferSequences_wBlockDelim( + zc, &seqPos, + zc->extSeqBuf, nbPostProcessedSeqs, + src, srcSize, + zc->appliedParams.searchForExternalRepcodes + ), + "Failed to copy external sequences to seqStore!" + ); + ms->ldmSeqStore =3D NULL; + DEBUGLOG(5, "Copied %lu sequences from external sequen= ce producer to internal seqStore.", (unsigned long)nbExternalSeqs); + return ZSTDbss_compress; + } + + /* Propagate the error if fallback is disabled */ + if (!zc->appliedParams.enableMatchFinderFallback) { + return nbPostProcessedSeqs; + } + + /* Fallback to software matchfinder */ + { ZSTD_BlockCompressor_f const blockCompressor =3D + ZSTD_selectBlockCompressor( + zc->appliedParams.cParams.strategy, + zc->appliedParams.useRowMatchFinder, + dictMode); + ms->ldmSeqStore =3D NULL; + DEBUGLOG( + 5, + "External sequence producer returned error code %l= u. Falling back to internal parser.", + (unsigned long)nbExternalSeqs + ); + lastLLSize =3D blockCompressor(ms, &zc->seqStore, zc->= blockState.nextCBlock->rep, src, srcSize); + } } + } else { /* not long range mode and no external matchfinder */ + ZSTD_BlockCompressor_f const blockCompressor =3D ZSTD_selectBl= ockCompressor( + zc->appliedParams.cParams.strategy, + zc->appliedParams.useRowMatchFinder, + dictMode); ms->ldmSeqStore =3D NULL; lastLLSize =3D blockCompressor(ms, &zc->seqStore, zc->blockSta= te.nextCBlock->rep, src, srcSize); } { const BYTE* const lastLiterals =3D (const BYTE*)src + srcSize = - lastLLSize; ZSTD_storeLastLiterals(&zc->seqStore, lastLiterals, lastLLSize= ); } } + ZSTD_validateSeqStore(&zc->seqStore, &zc->appliedParams.cParams); return ZSTDbss_compress; } =20 -static void ZSTD_copyBlockSequences(ZSTD_CCtx* zc) +static size_t ZSTD_copyBlockSequences(SeqCollector* seqCollector, const Se= qStore_t* seqStore, const U32 prevRepcodes[ZSTD_REP_NUM]) { - const seqStore_t* seqStore =3D ZSTD_getSeqStore(zc); - const seqDef* seqStoreSeqs =3D seqStore->sequencesStart; - size_t seqStoreSeqSize =3D seqStore->sequences - seqStoreSeqs; - size_t seqStoreLiteralsSize =3D (size_t)(seqStore->lit - seqStore->lit= Start); - size_t literalsRead =3D 0; - size_t lastLLSize; + const SeqDef* inSeqs =3D seqStore->sequencesStart; + const size_t nbInSequences =3D (size_t)(seqStore->sequences - inSeqs); + const size_t nbInLiterals =3D (size_t)(seqStore->lit - seqStore->litSt= art); =20 - ZSTD_Sequence* outSeqs =3D &zc->seqCollector.seqStart[zc->seqCollector= .seqIndex]; + ZSTD_Sequence* outSeqs =3D seqCollector->seqIndex =3D=3D 0 ? seqCollec= tor->seqStart : seqCollector->seqStart + seqCollector->seqIndex; + const size_t nbOutSequences =3D nbInSequences + 1; + size_t nbOutLiterals =3D 0; + Repcodes_t repcodes; size_t i; - repcodes_t updatedRepcodes; - - assert(zc->seqCollector.seqIndex + 1 < zc->seqCollector.maxSequences); - /* Ensure we have enough space for last literals "sequence" */ - assert(zc->seqCollector.maxSequences >=3D seqStoreSeqSize + 1); - ZSTD_memcpy(updatedRepcodes.rep, zc->blockState.prevCBlock->rep, sizeo= f(repcodes_t)); - for (i =3D 0; i < seqStoreSeqSize; ++i) { - U32 rawOffset =3D seqStoreSeqs[i].offBase - ZSTD_REP_NUM; - outSeqs[i].litLength =3D seqStoreSeqs[i].litLength; - outSeqs[i].matchLength =3D seqStoreSeqs[i].mlBase + MINMATCH; + + /* Bounds check that we have enough space for every input sequence + * and the block delimiter + */ + assert(seqCollector->seqIndex <=3D seqCollector->maxSequences); + RETURN_ERROR_IF( + nbOutSequences > (size_t)(seqCollector->maxSequences - seqCollecto= r->seqIndex), + dstSize_tooSmall, + "Not enough space to copy sequences"); + + ZSTD_memcpy(&repcodes, prevRepcodes, sizeof(repcodes)); + for (i =3D 0; i < nbInSequences; ++i) { + U32 rawOffset; + outSeqs[i].litLength =3D inSeqs[i].litLength; + outSeqs[i].matchLength =3D inSeqs[i].mlBase + MINMATCH; outSeqs[i].rep =3D 0; =20 + /* Handle the possible single length >=3D 64K + * There can only be one because we add MINMATCH to every match le= ngth, + * and blocks are at most 128K. + */ if (i =3D=3D seqStore->longLengthPos) { if (seqStore->longLengthType =3D=3D ZSTD_llt_literalLength) { outSeqs[i].litLength +=3D 0x10000; @@ -2832,46 +3380,75 @@ static void ZSTD_copyBlockSequences(ZSTD_CCtx* zc) } } =20 - if (seqStoreSeqs[i].offBase <=3D ZSTD_REP_NUM) { - /* Derive the correct offset corresponding to a repcode */ - outSeqs[i].rep =3D seqStoreSeqs[i].offBase; + /* Determine the raw offset given the offBase, which may be a repc= ode. */ + if (OFFBASE_IS_REPCODE(inSeqs[i].offBase)) { + const U32 repcode =3D OFFBASE_TO_REPCODE(inSeqs[i].offBase); + assert(repcode > 0); + outSeqs[i].rep =3D repcode; if (outSeqs[i].litLength !=3D 0) { - rawOffset =3D updatedRepcodes.rep[outSeqs[i].rep - 1]; + rawOffset =3D repcodes.rep[repcode - 1]; } else { - if (outSeqs[i].rep =3D=3D 3) { - rawOffset =3D updatedRepcodes.rep[0] - 1; + if (repcode =3D=3D 3) { + assert(repcodes.rep[0] > 1); + rawOffset =3D repcodes.rep[0] - 1; } else { - rawOffset =3D updatedRepcodes.rep[outSeqs[i].rep]; + rawOffset =3D repcodes.rep[repcode]; } } + } else { + rawOffset =3D OFFBASE_TO_OFFSET(inSeqs[i].offBase); } outSeqs[i].offset =3D rawOffset; - /* seqStoreSeqs[i].offset =3D=3D offCode+1, and ZSTD_updateRep() e= xpects offCode - so we provide seqStoreSeqs[i].offset - 1 */ - ZSTD_updateRep(updatedRepcodes.rep, - seqStoreSeqs[i].offBase - 1, - seqStoreSeqs[i].litLength =3D=3D 0); - literalsRead +=3D outSeqs[i].litLength; + + /* Update repcode history for the sequence */ + ZSTD_updateRep(repcodes.rep, + inSeqs[i].offBase, + inSeqs[i].litLength =3D=3D 0); + + nbOutLiterals +=3D outSeqs[i].litLength; } /* Insert last literals (if any exist) in the block as a sequence with= ml =3D=3D off =3D=3D 0. * If there are no last literals, then we'll emit (of: 0, ml: 0, ll: 0= ), which is a marker * for the block boundary, according to the API. */ - assert(seqStoreLiteralsSize >=3D literalsRead); - lastLLSize =3D seqStoreLiteralsSize - literalsRead; - outSeqs[i].litLength =3D (U32)lastLLSize; - outSeqs[i].matchLength =3D outSeqs[i].offset =3D outSeqs[i].rep =3D 0; - seqStoreSeqSize++; - zc->seqCollector.seqIndex +=3D seqStoreSeqSize; + assert(nbInLiterals >=3D nbOutLiterals); + { + const size_t lastLLSize =3D nbInLiterals - nbOutLiterals; + outSeqs[nbInSequences].litLength =3D (U32)lastLLSize; + outSeqs[nbInSequences].matchLength =3D 0; + outSeqs[nbInSequences].offset =3D 0; + assert(nbOutSequences =3D=3D nbInSequences + 1); + } + seqCollector->seqIndex +=3D nbOutSequences; + assert(seqCollector->seqIndex <=3D seqCollector->maxSequences); + + return 0; +} + +size_t ZSTD_sequenceBound(size_t srcSize) { + const size_t maxNbSeq =3D (srcSize / ZSTD_MINMATCH_MIN) + 1; + const size_t maxNbDelims =3D (srcSize / ZSTD_BLOCKSIZE_MAX_MIN) + 1; + return maxNbSeq + maxNbDelims; } =20 size_t ZSTD_generateSequences(ZSTD_CCtx* zc, ZSTD_Sequence* outSeqs, size_t outSeqsSize, const void* src, size_t = srcSize) { const size_t dstCapacity =3D ZSTD_compressBound(srcSize); - void* dst =3D ZSTD_customMalloc(dstCapacity, ZSTD_defaultCMem); + void* dst; /* Make C90 happy. */ SeqCollector seqCollector; + { + int targetCBlockSize; + FORWARD_IF_ERROR(ZSTD_CCtx_getParameter(zc, ZSTD_c_targetCBlockSiz= e, &targetCBlockSize), ""); + RETURN_ERROR_IF(targetCBlockSize !=3D 0, parameter_unsupported, "t= argetCBlockSize !=3D 0"); + } + { + int nbWorkers; + FORWARD_IF_ERROR(ZSTD_CCtx_getParameter(zc, ZSTD_c_nbWorkers, &nbW= orkers), ""); + RETURN_ERROR_IF(nbWorkers !=3D 0, parameter_unsupported, "nbWorker= s !=3D 0"); + } =20 + dst =3D ZSTD_customMalloc(dstCapacity, ZSTD_defaultCMem); RETURN_ERROR_IF(dst =3D=3D NULL, memory_allocation, "NULL pointer!"); =20 seqCollector.collectSequences =3D 1; @@ -2880,8 +3457,12 @@ size_t ZSTD_generateSequences(ZSTD_CCtx* zc, ZSTD_Se= quence* outSeqs, seqCollector.maxSequences =3D outSeqsSize; zc->seqCollector =3D seqCollector; =20 - ZSTD_compress2(zc, dst, dstCapacity, src, srcSize); - ZSTD_customFree(dst, ZSTD_defaultCMem); + { + const size_t ret =3D ZSTD_compress2(zc, dst, dstCapacity, src, src= Size); + ZSTD_customFree(dst, ZSTD_defaultCMem); + FORWARD_IF_ERROR(ret, "ZSTD_compress2 failed"); + } + assert(zc->seqCollector.seqIndex <=3D ZSTD_sequenceBound(srcSize)); return zc->seqCollector.seqIndex; } =20 @@ -2910,19 +3491,17 @@ static int ZSTD_isRLE(const BYTE* src, size_t lengt= h) { const size_t unrollMask =3D unrollSize - 1; const size_t prefixLength =3D length & unrollMask; size_t i; - size_t u; if (length =3D=3D 1) return 1; /* Check if prefix is RLE first before using unrolled loop */ if (prefixLength && ZSTD_count(ip+1, ip, ip+prefixLength) !=3D prefixL= ength-1) { return 0; } for (i =3D prefixLength; i !=3D length; i +=3D unrollSize) { + size_t u; for (u =3D 0; u < unrollSize; u +=3D sizeof(size_t)) { if (MEM_readST(ip + i + u) !=3D valueST) { return 0; - } - } - } + } } } return 1; } =20 @@ -2930,7 +3509,7 @@ static int ZSTD_isRLE(const BYTE* src, size_t length)= { * This is just a heuristic based on the compressibility. * It may return both false positives and false negatives. */ -static int ZSTD_maybeRLE(seqStore_t const* seqStore) +static int ZSTD_maybeRLE(SeqStore_t const* seqStore) { size_t const nbSeqs =3D (size_t)(seqStore->sequences - seqStore->seque= ncesStart); size_t const nbLits =3D (size_t)(seqStore->lit - seqStore->litStart); @@ -2938,7 +3517,8 @@ static int ZSTD_maybeRLE(seqStore_t const* seqStore) return nbSeqs < 4 && nbLits < 10; } =20 -static void ZSTD_blockState_confirmRepcodesAndEntropyTables(ZSTD_blockStat= e_t* const bs) +static void +ZSTD_blockState_confirmRepcodesAndEntropyTables(ZSTD_blockState_t* const b= s) { ZSTD_compressedBlockState_t* const tmp =3D bs->prevCBlock; bs->prevCBlock =3D bs->nextCBlock; @@ -2946,12 +3526,14 @@ static void ZSTD_blockState_confirmRepcodesAndEntro= pyTables(ZSTD_blockState_t* c } =20 /* Writes the block header */ -static void writeBlockHeader(void* op, size_t cSize, size_t blockSize, U32= lastBlock) { +static void +writeBlockHeader(void* op, size_t cSize, size_t blockSize, U32 lastBlock) +{ U32 const cBlockHeader =3D cSize =3D=3D 1 ? lastBlock + (((U32)bt_rle)<<1) + (U32)(blockSize <= < 3) : lastBlock + (((U32)bt_compressed)<<1) + (U32)(cSiz= e << 3); MEM_writeLE24(op, cBlockHeader); - DEBUGLOG(3, "writeBlockHeader: cSize: %zu blockSize: %zu lastBlock: %u= ", cSize, blockSize, lastBlock); + DEBUGLOG(5, "writeBlockHeader: cSize: %zu blockSize: %zu lastBlock: %u= ", cSize, blockSize, lastBlock); } =20 /* ZSTD_buildBlockEntropyStats_literals() : @@ -2959,13 +3541,16 @@ static void writeBlockHeader(void* op, size_t cSize= , size_t blockSize, U32 lastB * Stores literals block type (raw, rle, compressed, repeat) and * huffman description table to hufMetadata. * Requires ENTROPY_WORKSPACE_SIZE workspace - * @return : size of huffman description table or error code */ -static size_t ZSTD_buildBlockEntropyStats_literals(void* const src, size_t= srcSize, - const ZSTD_hufCTables_t* prevH= uf, - ZSTD_hufCTables_t* nextH= uf, - ZSTD_hufCTablesMetadata_= t* hufMetadata, - const int literalsCompre= ssionIsDisabled, - void* workspace, size_t = wkspSize) + * @return : size of huffman description table, or an error code + */ +static size_t +ZSTD_buildBlockEntropyStats_literals(void* const src, size_t srcSize, + const ZSTD_hufCTables_t* prevHuf, + ZSTD_hufCTables_t* nextHuf, + ZSTD_hufCTablesMetadata_t* hufMetadat= a, + const int literalsCompressionIsDisabled, + void* workspace, size_t wkspSize, + int hufFlags) { BYTE* const wkspStart =3D (BYTE*)workspace; BYTE* const wkspEnd =3D wkspStart + wkspSize; @@ -2973,9 +3558,9 @@ static size_t ZSTD_buildBlockEntropyStats_literals(vo= id* const src, size_t srcSi unsigned* const countWksp =3D (unsigned*)workspace; const size_t countWkspSize =3D (HUF_SYMBOLVALUE_MAX + 1) * sizeof(unsi= gned); BYTE* const nodeWksp =3D countWkspStart + countWkspSize; - const size_t nodeWkspSize =3D wkspEnd-nodeWksp; + const size_t nodeWkspSize =3D (size_t)(wkspEnd - nodeWksp); unsigned maxSymbolValue =3D HUF_SYMBOLVALUE_MAX; - unsigned huffLog =3D HUF_TABLELOG_DEFAULT; + unsigned huffLog =3D LitHufLog; HUF_repeat repeat =3D prevHuf->repeatMode; DEBUGLOG(5, "ZSTD_buildBlockEntropyStats_literals (srcSize=3D%zu)", sr= cSize); =20 @@ -2990,73 +3575,77 @@ static size_t ZSTD_buildBlockEntropyStats_literals(= void* const src, size_t srcSi =20 /* small ? don't even attempt compression (speed opt) */ #ifndef COMPRESS_LITERALS_SIZE_MIN -#define COMPRESS_LITERALS_SIZE_MIN 63 +# define COMPRESS_LITERALS_SIZE_MIN 63 /* heuristic */ #endif { size_t const minLitSize =3D (prevHuf->repeatMode =3D=3D HUF_repeat= _valid) ? 6 : COMPRESS_LITERALS_SIZE_MIN; if (srcSize <=3D minLitSize) { DEBUGLOG(5, "set_basic - too small"); hufMetadata->hType =3D set_basic; return 0; - } - } + } } =20 /* Scan input and build symbol stats */ - { size_t const largest =3D HIST_count_wksp (countWksp, &maxSymbolVal= ue, (const BYTE*)src, srcSize, workspace, wkspSize); + { size_t const largest =3D + HIST_count_wksp (countWksp, &maxSymbolValue, + (const BYTE*)src, srcSize, + workspace, wkspSize); FORWARD_IF_ERROR(largest, "HIST_count_wksp failed"); if (largest =3D=3D srcSize) { + /* only one literal symbol */ DEBUGLOG(5, "set_rle"); hufMetadata->hType =3D set_rle; return 0; } if (largest <=3D (srcSize >> 7)+4) { + /* heuristic: likely not compressible */ DEBUGLOG(5, "set_basic - no gain"); hufMetadata->hType =3D set_basic; return 0; - } - } + } } =20 /* Validate the previous Huffman table */ - if (repeat =3D=3D HUF_repeat_check && !HUF_validateCTable((HUF_CElt co= nst*)prevHuf->CTable, countWksp, maxSymbolValue)) { + if (repeat =3D=3D HUF_repeat_check + && !HUF_validateCTable((HUF_CElt const*)prevHuf->CTable, countWksp, = maxSymbolValue)) { repeat =3D HUF_repeat_none; } =20 /* Build Huffman Tree */ ZSTD_memset(nextHuf->CTable, 0, sizeof(nextHuf->CTable)); - huffLog =3D HUF_optimalTableLog(huffLog, srcSize, maxSymbolValue); + huffLog =3D HUF_optimalTableLog(huffLog, srcSize, maxSymbolValue, node= Wksp, nodeWkspSize, nextHuf->CTable, countWksp, hufFlags); + assert(huffLog <=3D LitHufLog); { size_t const maxBits =3D HUF_buildCTable_wksp((HUF_CElt*)nextHuf->= CTable, countWksp, maxSymbolValue, huffLo= g, nodeWksp, nodeWkspSize= ); FORWARD_IF_ERROR(maxBits, "HUF_buildCTable_wksp"); huffLog =3D (U32)maxBits; - { /* Build and write the CTable */ - size_t const newCSize =3D HUF_estimateCompressedSize( - (HUF_CElt*)nextHuf->CTable, countWksp, maxSymbolValue); - size_t const hSize =3D HUF_writeCTable_wksp( - hufMetadata->hufDesBuffer, sizeof(hufMetadata->hufDesB= uffer), - (HUF_CElt*)nextHuf->CTable, maxSymbolValue, huffLog, - nodeWksp, nodeWkspSize); - /* Check against repeating the previous CTable */ - if (repeat !=3D HUF_repeat_none) { - size_t const oldCSize =3D HUF_estimateCompressedSize( - (HUF_CElt const*)prevHuf->CTable, countWksp, maxSy= mbolValue); - if (oldCSize < srcSize && (oldCSize <=3D hSize + newCSize = || hSize + 12 >=3D srcSize)) { - DEBUGLOG(5, "set_repeat - smaller"); - ZSTD_memcpy(nextHuf, prevHuf, sizeof(*prevHuf)); - hufMetadata->hType =3D set_repeat; - return 0; - } - } - if (newCSize + hSize >=3D srcSize) { - DEBUGLOG(5, "set_basic - no gains"); + } + { /* Build and write the CTable */ + size_t const newCSize =3D HUF_estimateCompressedSize( + (HUF_CElt*)nextHuf->CTable, countWksp, maxSymbolValue); + size_t const hSize =3D HUF_writeCTable_wksp( + hufMetadata->hufDesBuffer, sizeof(hufMetadata->hufDesBuffe= r), + (HUF_CElt*)nextHuf->CTable, maxSymbolValue, huffLog, + nodeWksp, nodeWkspSize); + /* Check against repeating the previous CTable */ + if (repeat !=3D HUF_repeat_none) { + size_t const oldCSize =3D HUF_estimateCompressedSize( + (HUF_CElt const*)prevHuf->CTable, countWksp, maxSymbol= Value); + if (oldCSize < srcSize && (oldCSize <=3D hSize + newCSize || h= Size + 12 >=3D srcSize)) { + DEBUGLOG(5, "set_repeat - smaller"); ZSTD_memcpy(nextHuf, prevHuf, sizeof(*prevHuf)); - hufMetadata->hType =3D set_basic; + hufMetadata->hType =3D set_repeat; return 0; - } - DEBUGLOG(5, "set_compressed (hSize=3D%u)", (U32)hSize); - hufMetadata->hType =3D set_compressed; - nextHuf->repeatMode =3D HUF_repeat_check; - return hSize; + } } + if (newCSize + hSize >=3D srcSize) { + DEBUGLOG(5, "set_basic - no gains"); + ZSTD_memcpy(nextHuf, prevHuf, sizeof(*prevHuf)); + hufMetadata->hType =3D set_basic; + return 0; } + DEBUGLOG(5, "set_compressed (hSize=3D%u)", (U32)hSize); + hufMetadata->hType =3D set_compressed; + nextHuf->repeatMode =3D HUF_repeat_check; + return hSize; } } =20 @@ -3066,8 +3655,9 @@ static size_t ZSTD_buildBlockEntropyStats_literals(vo= id* const src, size_t srcSi * and updates nextEntropy to the appropriate repeatMode. */ static ZSTD_symbolEncodingTypeStats_t -ZSTD_buildDummySequencesStatistics(ZSTD_fseCTables_t* nextEntropy) { - ZSTD_symbolEncodingTypeStats_t stats =3D {set_basic, set_basic, set_ba= sic, 0, 0}; +ZSTD_buildDummySequencesStatistics(ZSTD_fseCTables_t* nextEntropy) +{ + ZSTD_symbolEncodingTypeStats_t stats =3D {set_basic, set_basic, set_ba= sic, 0, 0, 0}; nextEntropy->litlength_repeatMode =3D FSE_repeat_none; nextEntropy->offcode_repeatMode =3D FSE_repeat_none; nextEntropy->matchlength_repeatMode =3D FSE_repeat_none; @@ -3078,16 +3668,18 @@ ZSTD_buildDummySequencesStatistics(ZSTD_fseCTables_= t* nextEntropy) { * Builds entropy for the sequences. * Stores symbol compression modes and fse table to fseMetadata. * Requires ENTROPY_WORKSPACE_SIZE wksp. - * @return : size of fse tables or error code */ -static size_t ZSTD_buildBlockEntropyStats_sequences(seqStore_t* seqStorePt= r, - const ZSTD_fseCTables_t* pre= vEntropy, - ZSTD_fseCTables_t* nex= tEntropy, - const ZSTD_CCtx_params* cctx= Params, - ZSTD_fseCTablesMetadat= a_t* fseMetadata, - void* workspace, size_= t wkspSize) + * @return : size of fse tables or error code */ +static size_t +ZSTD_buildBlockEntropyStats_sequences( + const SeqStore_t* seqStorePtr, + const ZSTD_fseCTables_t* prevEntropy, + ZSTD_fseCTables_t* nextEntropy, + const ZSTD_CCtx_params* cctxParams, + ZSTD_fseCTablesMetadata_t* fseMetadata, + void* workspace, size_t wkspSize) { ZSTD_strategy const strategy =3D cctxParams->cParams.strategy; - size_t const nbSeq =3D seqStorePtr->sequences - seqStorePtr->sequences= Start; + size_t const nbSeq =3D (size_t)(seqStorePtr->sequences - seqStorePtr->= sequencesStart); BYTE* const ostart =3D fseMetadata->fseTablesBuffer; BYTE* const oend =3D ostart + sizeof(fseMetadata->fseTablesBuffer); BYTE* op =3D ostart; @@ -3103,9 +3695,9 @@ static size_t ZSTD_buildBlockEntropyStats_sequences(s= eqStore_t* seqStorePtr, entropyWorkspace, entropyWorkspa= ceSize) : ZSTD_buildDummySequencesStatistics(nextEntropy); FORWARD_IF_ERROR(stats.size, "ZSTD_buildSequencesStatistics failed!"); - fseMetadata->llType =3D (symbolEncodingType_e) stats.LLtype; - fseMetadata->ofType =3D (symbolEncodingType_e) stats.Offtype; - fseMetadata->mlType =3D (symbolEncodingType_e) stats.MLtype; + fseMetadata->llType =3D (SymbolEncodingType_e) stats.LLtype; + fseMetadata->ofType =3D (SymbolEncodingType_e) stats.Offtype; + fseMetadata->mlType =3D (SymbolEncodingType_e) stats.MLtype; fseMetadata->lastCountSize =3D stats.lastCountSize; return stats.size; } @@ -3114,23 +3706,28 @@ static size_t ZSTD_buildBlockEntropyStats_sequences= (seqStore_t* seqStorePtr, /* ZSTD_buildBlockEntropyStats() : * Builds entropy for the block. * Requires workspace size ENTROPY_WORKSPACE_SIZE - * - * @return : 0 on success or error code + * @return : 0 on success, or an error code + * Note : also employed in superblock */ -size_t ZSTD_buildBlockEntropyStats(seqStore_t* seqStorePtr, - const ZSTD_entropyCTables_t* prevEntropy, - ZSTD_entropyCTables_t* nextEntropy, - const ZSTD_CCtx_params* cctxParams, - ZSTD_entropyCTablesMetadata_t* entropyM= etadata, - void* workspace, size_t wkspSize) -{ - size_t const litSize =3D seqStorePtr->lit - seqStorePtr->litStart; +size_t ZSTD_buildBlockEntropyStats( + const SeqStore_t* seqStorePtr, + const ZSTD_entropyCTables_t* prevEntropy, + ZSTD_entropyCTables_t* nextEntropy, + const ZSTD_CCtx_params* cctxParams, + ZSTD_entropyCTablesMetadata_t* entropyMetadata, + void* workspace, size_t wkspSize) +{ + size_t const litSize =3D (size_t)(seqStorePtr->lit - seqStorePtr->litS= tart); + int const huf_useOptDepth =3D (cctxParams->cParams.strategy >=3D HUF_O= PTIMAL_DEPTH_THRESHOLD); + int const hufFlags =3D huf_useOptDepth ? HUF_flags_optimalDepth : 0; + entropyMetadata->hufMetadata.hufDesSize =3D ZSTD_buildBlockEntropyStats_literals(seqStorePtr->litStart, litSiz= e, &prevEntropy->huf, &nextEntrop= y->huf, &entropyMetadata->hufMetadata, ZSTD_literalsCompressionIsDisa= bled(cctxParams), - workspace, wkspSize); + workspace, wkspSize, hufFlags); + FORWARD_IF_ERROR(entropyMetadata->hufMetadata.hufDesSize, "ZSTD_buildB= lockEntropyStats_literals failed"); entropyMetadata->fseMetadata.fseTablesSize =3D ZSTD_buildBlockEntropyStats_sequences(seqStorePtr, @@ -3143,11 +3740,12 @@ size_t ZSTD_buildBlockEntropyStats(seqStore_t* seqS= torePtr, } =20 /* Returns the size estimate for the literals section (header + content) o= f a block */ -static size_t ZSTD_estimateBlockSize_literal(const BYTE* literals, size_t = litSize, - const ZSTD_hufCTables_t* h= uf, - const ZSTD_hufCTablesMetad= ata_t* hufMetadata, - void* workspace, size_t wk= spSize, - int writeEntropy) +static size_t +ZSTD_estimateBlockSize_literal(const BYTE* literals, size_t litSize, + const ZSTD_hufCTables_t* huf, + const ZSTD_hufCTablesMetadata_t* hufMetadat= a, + void* workspace, size_t wkspSize, + int writeEntropy) { unsigned* const countWksp =3D (unsigned*)workspace; unsigned maxSymbolValue =3D HUF_SYMBOLVALUE_MAX; @@ -3169,12 +3767,13 @@ static size_t ZSTD_estimateBlockSize_literal(const = BYTE* literals, size_t litSiz } =20 /* Returns the size estimate for the FSE-compressed symbols (of, ml, ll) o= f a block */ -static size_t ZSTD_estimateBlockSize_symbolType(symbolEncodingType_e type, - const BYTE* codeTable, size_t nbSeq, unsigned maxC= ode, - const FSE_CTable* fseCTable, - const U8* additionalBits, - short const* defaultNorm, U32 defaultNormLog, U32 = defaultMax, - void* workspace, size_t wkspSize) +static size_t +ZSTD_estimateBlockSize_symbolType(SymbolEncodingType_e type, + const BYTE* codeTable, size_t nbSeq, unsigned maxCode, + const FSE_CTable* fseCTable, + const U8* additionalBits, + short const* defaultNorm, U32 defaultNormLog, U32 defa= ultMax, + void* workspace, size_t wkspSize) { unsigned* const countWksp =3D (unsigned*)workspace; const BYTE* ctp =3D codeTable; @@ -3206,116 +3805,121 @@ static size_t ZSTD_estimateBlockSize_symbolType(s= ymbolEncodingType_e type, } =20 /* Returns the size estimate for the sequences section (header + content) = of a block */ -static size_t ZSTD_estimateBlockSize_sequences(const BYTE* ofCodeTable, - const BYTE* llCodeTable, - const BYTE* mlCodeTable, - size_t nbSeq, - const ZSTD_fseCTables_t*= fseTables, - const ZSTD_fseCTablesMet= adata_t* fseMetadata, - void* workspace, size_t = wkspSize, - int writeEntropy) +static size_t +ZSTD_estimateBlockSize_sequences(const BYTE* ofCodeTable, + const BYTE* llCodeTable, + const BYTE* mlCodeTable, + size_t nbSeq, + const ZSTD_fseCTables_t* fseTables, + const ZSTD_fseCTablesMetadata_t* fseMetad= ata, + void* workspace, size_t wkspSize, + int writeEntropy) { size_t sequencesSectionHeaderSize =3D 1 /* seqHead */ + 1 /* min seqSi= ze size */ + (nbSeq >=3D 128) + (nbSeq >=3D LONGNBSEQ); size_t cSeqSizeEstimate =3D 0; cSeqSizeEstimate +=3D ZSTD_estimateBlockSize_symbolType(fseMetadata->o= fType, ofCodeTable, nbSeq, MaxOff, - fseTables->offcodeCTable, NULL, - OF_defaultNorm, OF_defaultNormLog= , DefaultMaxOff, - workspace, wkspSize); + fseTables->offcodeCTable, NULL, + OF_defaultNorm, OF_defaultNormLog, Def= aultMaxOff, + workspace, wkspSize); cSeqSizeEstimate +=3D ZSTD_estimateBlockSize_symbolType(fseMetadata->l= lType, llCodeTable, nbSeq, MaxLL, - fseTables->litlengthCTable, LL_bi= ts, - LL_defaultNorm, LL_defaultNormLog= , MaxLL, - workspace, wkspSize); + fseTables->litlengthCTable, LL_bits, + LL_defaultNorm, LL_defaultNormLog, Max= LL, + workspace, wkspSize); cSeqSizeEstimate +=3D ZSTD_estimateBlockSize_symbolType(fseMetadata->m= lType, mlCodeTable, nbSeq, MaxML, - fseTables->matchlengthCTable, ML_= bits, - ML_defaultNorm, ML_defaultNormLog= , MaxML, - workspace, wkspSize); + fseTables->matchlengthCTable, ML_bits, + ML_defaultNorm, ML_defaultNormLog, Max= ML, + workspace, wkspSize); if (writeEntropy) cSeqSizeEstimate +=3D fseMetadata->fseTablesSize; return cSeqSizeEstimate + sequencesSectionHeaderSize; } =20 /* Returns the size estimate for a given stream of literals, of, ll, ml */ -static size_t ZSTD_estimateBlockSize(const BYTE* literals, size_t litSize, - const BYTE* ofCodeTable, - const BYTE* llCodeTable, - const BYTE* mlCodeTable, - size_t nbSeq, - const ZSTD_entropyCTables_t* entropy, - const ZSTD_entropyCTablesMetadata_t* = entropyMetadata, - void* workspace, size_t wkspSize, - int writeLitEntropy, int writeSeqEntr= opy) { +static size_t +ZSTD_estimateBlockSize(const BYTE* literals, size_t litSize, + const BYTE* ofCodeTable, + const BYTE* llCodeTable, + const BYTE* mlCodeTable, + size_t nbSeq, + const ZSTD_entropyCTables_t* entropy, + const ZSTD_entropyCTablesMetadata_t* entropyMetadat= a, + void* workspace, size_t wkspSize, + int writeLitEntropy, int writeSeqEntropy) +{ size_t const literalsSize =3D ZSTD_estimateBlockSize_literal(literals,= litSize, - &entropy->huf, &e= ntropyMetadata->hufMetadata, - workspace, wkspSi= ze, writeLitEntropy); + &entropy->huf, &entropyMetadata->hufMe= tadata, + workspace, wkspSize, writeLitEntropy); size_t const seqSize =3D ZSTD_estimateBlockSize_sequences(ofCodeTable,= llCodeTable, mlCodeTable, - nbSeq, &entropy->= fse, &entropyMetadata->fseMetadata, - workspace, wkspSi= ze, writeSeqEntropy); + nbSeq, &entropy->fse, &entropyMetadata= ->fseMetadata, + workspace, wkspSize, writeSeqEntropy); return seqSize + literalsSize + ZSTD_blockHeaderSize; } =20 /* Builds entropy statistics and uses them for blocksize estimation. * - * Returns the estimated compressed size of the seqStore, or a zstd error. + * @return: estimated compressed size of the seqStore, or a zstd error. */ -static size_t ZSTD_buildEntropyStatisticsAndEstimateSubBlockSize(seqStore_= t* seqStore, ZSTD_CCtx* zc) { - ZSTD_entropyCTablesMetadata_t* entropyMetadata =3D &zc->blockSplitCtx.= entropyMetadata; +static size_t +ZSTD_buildEntropyStatisticsAndEstimateSubBlockSize(SeqStore_t* seqStore, Z= STD_CCtx* zc) +{ + ZSTD_entropyCTablesMetadata_t* const entropyMetadata =3D &zc->blockSpl= itCtx.entropyMetadata; DEBUGLOG(6, "ZSTD_buildEntropyStatisticsAndEstimateSubBlockSize()"); FORWARD_IF_ERROR(ZSTD_buildBlockEntropyStats(seqStore, &zc->blockState.prevCBlock->entropy, &zc->blockState.nextCBlock->entropy, &zc->appliedParams, entropyMetadata, - zc->entropyWorkspace, ENTROPY_WORKSPACE_SIZE /* static= ally allocated in resetCCtx */), ""); - return ZSTD_estimateBlockSize(seqStore->litStart, (size_t)(seqStore->l= it - seqStore->litStart), + zc->tmpWorkspace, zc->tmpWkspSize), ""); + return ZSTD_estimateBlockSize( + seqStore->litStart, (size_t)(seqStore->lit - seqStore-= >litStart), seqStore->ofCode, seqStore->llCode, seqStore->mlCode, (size_t)(seqStore->sequences - seqStore->sequencesStar= t), - &zc->blockState.nextCBlock->entropy, entropyMetadata, = zc->entropyWorkspace, ENTROPY_WORKSPACE_SIZE, + &zc->blockState.nextCBlock->entropy, + entropyMetadata, + zc->tmpWorkspace, zc->tmpWkspSize, (int)(entropyMetadata->hufMetadata.hType =3D=3D set_co= mpressed), 1); } =20 /* Returns literals bytes represented in a seqStore */ -static size_t ZSTD_countSeqStoreLiteralsBytes(const seqStore_t* const seqS= tore) { +static size_t ZSTD_countSeqStoreLiteralsBytes(const SeqStore_t* const seqS= tore) +{ size_t literalsBytes =3D 0; - size_t const nbSeqs =3D seqStore->sequences - seqStore->sequencesStart; + size_t const nbSeqs =3D (size_t)(seqStore->sequences - seqStore->seque= ncesStart); size_t i; for (i =3D 0; i < nbSeqs; ++i) { - seqDef seq =3D seqStore->sequencesStart[i]; + SeqDef const seq =3D seqStore->sequencesStart[i]; literalsBytes +=3D seq.litLength; if (i =3D=3D seqStore->longLengthPos && seqStore->longLengthType = =3D=3D ZSTD_llt_literalLength) { literalsBytes +=3D 0x10000; - } - } + } } return literalsBytes; } =20 /* Returns match bytes represented in a seqStore */ -static size_t ZSTD_countSeqStoreMatchBytes(const seqStore_t* const seqStor= e) { +static size_t ZSTD_countSeqStoreMatchBytes(const SeqStore_t* const seqStor= e) +{ size_t matchBytes =3D 0; - size_t const nbSeqs =3D seqStore->sequences - seqStore->sequencesStart; + size_t const nbSeqs =3D (size_t)(seqStore->sequences - seqStore->seque= ncesStart); size_t i; for (i =3D 0; i < nbSeqs; ++i) { - seqDef seq =3D seqStore->sequencesStart[i]; + SeqDef seq =3D seqStore->sequencesStart[i]; matchBytes +=3D seq.mlBase + MINMATCH; if (i =3D=3D seqStore->longLengthPos && seqStore->longLengthType = =3D=3D ZSTD_llt_matchLength) { matchBytes +=3D 0x10000; - } - } + } } return matchBytes; } =20 /* Derives the seqStore that is a chunk of the originalSeqStore from [star= tIdx, endIdx). * Stores the result in resultSeqStore. */ -static void ZSTD_deriveSeqStoreChunk(seqStore_t* resultSeqStore, - const seqStore_t* originalSeqStore, - size_t startIdx, size_t endIdx) { - BYTE* const litEnd =3D originalSeqStore->lit; - size_t literalsBytes; - size_t literalsBytesPreceding =3D 0; - +static void ZSTD_deriveSeqStoreChunk(SeqStore_t* resultSeqStore, + const SeqStore_t* originalSeqStore, + size_t startIdx, size_t endIdx) +{ *resultSeqStore =3D *originalSeqStore; if (startIdx > 0) { resultSeqStore->sequences =3D originalSeqStore->sequencesStart + s= tartIdx; - literalsBytesPreceding =3D ZSTD_countSeqStoreLiteralsBytes(resultS= eqStore); + resultSeqStore->litStart +=3D ZSTD_countSeqStoreLiteralsBytes(resu= ltSeqStore); } =20 /* Move longLengthPos into the correct position if necessary */ @@ -3328,13 +3932,12 @@ static void ZSTD_deriveSeqStoreChunk(seqStore_t* re= sultSeqStore, } resultSeqStore->sequencesStart =3D originalSeqStore->sequencesStart + = startIdx; resultSeqStore->sequences =3D originalSeqStore->sequencesStart + endId= x; - literalsBytes =3D ZSTD_countSeqStoreLiteralsBytes(resultSeqStore); - resultSeqStore->litStart +=3D literalsBytesPreceding; if (endIdx =3D=3D (size_t)(originalSeqStore->sequences - originalSeqSt= ore->sequencesStart)) { /* This accounts for possible last literals if the derived chunk r= eaches the end of the block */ - resultSeqStore->lit =3D litEnd; + assert(resultSeqStore->lit =3D=3D originalSeqStore->lit); } else { - resultSeqStore->lit =3D resultSeqStore->litStart+literalsBytes; + size_t const literalsBytes =3D ZSTD_countSeqStoreLiteralsBytes(res= ultSeqStore); + resultSeqStore->lit =3D resultSeqStore->litStart + literalsBytes; } resultSeqStore->llCode +=3D startIdx; resultSeqStore->mlCode +=3D startIdx; @@ -3342,20 +3945,26 @@ static void ZSTD_deriveSeqStoreChunk(seqStore_t* re= sultSeqStore, } =20 /* - * Returns the raw offset represented by the combination of offCode, ll0, = and repcode history. - * offCode must represent a repcode in the numeric representation of ZSTD_= storeSeq(). + * Returns the raw offset represented by the combination of offBase, ll0, = and repcode history. + * offBase must represent a repcode in the numeric representation of ZSTD_= storeSeq(). */ static U32 -ZSTD_resolveRepcodeToRawOffset(const U32 rep[ZSTD_REP_NUM], const U32 offC= ode, const U32 ll0) -{ - U32 const adjustedOffCode =3D STORED_REPCODE(offCode) - 1 + ll0; /* [= 0 - 3 ] */ - assert(STORED_IS_REPCODE(offCode)); - if (adjustedOffCode =3D=3D ZSTD_REP_NUM) { - /* litlength =3D=3D 0 and offCode =3D=3D 2 implies selection of fi= rst repcode - 1 */ - assert(rep[0] > 0); +ZSTD_resolveRepcodeToRawOffset(const U32 rep[ZSTD_REP_NUM], const U32 offB= ase, const U32 ll0) +{ + U32 const adjustedRepCode =3D OFFBASE_TO_REPCODE(offBase) - 1 + ll0; = /* [ 0 - 3 ] */ + assert(OFFBASE_IS_REPCODE(offBase)); + if (adjustedRepCode =3D=3D ZSTD_REP_NUM) { + assert(ll0); + /* litlength =3D=3D 0 and offCode =3D=3D 2 implies selection of fi= rst repcode - 1 + * This is only valid if it results in a valid offset value, aka >= 0. + * Note : it may happen that `rep[0]=3D=3D1` in exceptional circum= stances. + * In which case this function will return 0, which is an invalid = offset. + * It's not an issue though, since this value will be + * compared and discarded within ZSTD_seqStore_resolveOffCodes(). + */ return rep[0] - 1; } - return rep[adjustedOffCode]; + return rep[adjustedRepCode]; } =20 /* @@ -3371,30 +3980,33 @@ ZSTD_resolveRepcodeToRawOffset(const U32 rep[ZSTD_R= EP_NUM], const U32 offCode, c * 1-3 : repcode 1-3 * 4+ : real_offset+3 */ -static void ZSTD_seqStore_resolveOffCodes(repcodes_t* const dRepcodes, rep= codes_t* const cRepcodes, - seqStore_t* const seqStore, U32 = const nbSeq) { +static void +ZSTD_seqStore_resolveOffCodes(Repcodes_t* const dRepcodes, Repcodes_t* con= st cRepcodes, + const SeqStore_t* const seqStore, U32 const nbSeq) +{ U32 idx =3D 0; + U32 const longLitLenIdx =3D seqStore->longLengthType =3D=3D ZSTD_llt_l= iteralLength ? seqStore->longLengthPos : nbSeq; for (; idx < nbSeq; ++idx) { - seqDef* const seq =3D seqStore->sequencesStart + idx; - U32 const ll0 =3D (seq->litLength =3D=3D 0); - U32 const offCode =3D OFFBASE_TO_STORED(seq->offBase); - assert(seq->offBase > 0); - if (STORED_IS_REPCODE(offCode)) { - U32 const dRawOffset =3D ZSTD_resolveRepcodeToRawOffset(dRepco= des->rep, offCode, ll0); - U32 const cRawOffset =3D ZSTD_resolveRepcodeToRawOffset(cRepco= des->rep, offCode, ll0); + SeqDef* const seq =3D seqStore->sequencesStart + idx; + U32 const ll0 =3D (seq->litLength =3D=3D 0) && (idx !=3D longLitLe= nIdx); + U32 const offBase =3D seq->offBase; + assert(offBase > 0); + if (OFFBASE_IS_REPCODE(offBase)) { + U32 const dRawOffset =3D ZSTD_resolveRepcodeToRawOffset(dRepco= des->rep, offBase, ll0); + U32 const cRawOffset =3D ZSTD_resolveRepcodeToRawOffset(cRepco= des->rep, offBase, ll0); /* Adjust simulated decompression repcode history if we come a= cross a mismatch. Replace * the repcode with the offset it actually references, determi= ned by the compression * repcode history. */ if (dRawOffset !=3D cRawOffset) { - seq->offBase =3D cRawOffset + ZSTD_REP_NUM; + seq->offBase =3D OFFSET_TO_OFFBASE(cRawOffset); } } /* Compression repcode history is always updated with values direc= tly from the unmodified seqStore. * Decompression repcode history may use modified seq->offset valu= e taken from compression repcode history. */ - ZSTD_updateRep(dRepcodes->rep, OFFBASE_TO_STORED(seq->offBase), ll= 0); - ZSTD_updateRep(cRepcodes->rep, offCode, ll0); + ZSTD_updateRep(dRepcodes->rep, seq->offBase, ll0); + ZSTD_updateRep(cRepcodes->rep, offBase, ll0); } } =20 @@ -3404,10 +4016,11 @@ static void ZSTD_seqStore_resolveOffCodes(repcodes_= t* const dRepcodes, repcodes_ * Returns the total size of that block (including header) or a ZSTD error= code. */ static size_t -ZSTD_compressSeqStore_singleBlock(ZSTD_CCtx* zc, seqStore_t* const seqStor= e, - repcodes_t* const dRep, repcodes_t* cons= t cRep, +ZSTD_compressSeqStore_singleBlock(ZSTD_CCtx* zc, + const SeqStore_t* const seqStore, + Repcodes_t* const dRep, Repcodes_t* cons= t cRep, void* dst, size_t dstCapacity, - const void* src, size_t srcSize, + const void* src, size_t srcSize, U32 lastBlock, U32 isPartition) { const U32 rleMaxLength =3D 25; @@ -3417,7 +4030,7 @@ ZSTD_compressSeqStore_singleBlock(ZSTD_CCtx* zc, seqS= tore_t* const seqStore, size_t cSeqsSize; =20 /* In case of an RLE or raw block, the simulated decompression repcode= history must be reset */ - repcodes_t const dRepOriginal =3D *dRep; + Repcodes_t const dRepOriginal =3D *dRep; DEBUGLOG(5, "ZSTD_compressSeqStore_singleBlock"); if (isPartition) ZSTD_seqStore_resolveOffCodes(dRep, cRep, seqStore, (U32)(seqStore= ->sequences - seqStore->sequencesStart)); @@ -3428,7 +4041,7 @@ ZSTD_compressSeqStore_singleBlock(ZSTD_CCtx* zc, seqS= tore_t* const seqStore, &zc->appliedParams, op + ZSTD_blockHeaderSize, dstCapacity - ZSTD_blockHeaderS= ize, srcSize, - zc->entropyWorkspace, ENTROPY_WORKSPACE_SIZE /* statically= allocated in resetCCtx */, + zc->tmpWorkspace, zc->tmpWkspSize /* statically allocated = in resetCCtx */, zc->bmi2); FORWARD_IF_ERROR(cSeqsSize, "ZSTD_entropyCompressSeqStore failed!"); =20 @@ -3442,8 +4055,9 @@ ZSTD_compressSeqStore_singleBlock(ZSTD_CCtx* zc, seqS= tore_t* const seqStore, cSeqsSize =3D 1; } =20 + /* Sequence collection not supported when block splitting */ if (zc->seqCollector.collectSequences) { - ZSTD_copyBlockSequences(zc); + FORWARD_IF_ERROR(ZSTD_copyBlockSequences(&zc->seqCollector, seqSto= re, dRepOriginal.rep), "copyBlockSequences failed"); ZSTD_blockState_confirmRepcodesAndEntropyTables(&zc->blockState); return 0; } @@ -3451,18 +4065,18 @@ ZSTD_compressSeqStore_singleBlock(ZSTD_CCtx* zc, se= qStore_t* const seqStore, if (cSeqsSize =3D=3D 0) { cSize =3D ZSTD_noCompressBlock(op, dstCapacity, ip, srcSize, lastB= lock); FORWARD_IF_ERROR(cSize, "Nocompress block failed"); - DEBUGLOG(4, "Writing out nocompress block, size: %zu", cSize); + DEBUGLOG(5, "Writing out nocompress block, size: %zu", cSize); *dRep =3D dRepOriginal; /* reset simulated decompression repcode h= istory */ } else if (cSeqsSize =3D=3D 1) { cSize =3D ZSTD_rleCompressBlock(op, dstCapacity, *ip, srcSize, las= tBlock); FORWARD_IF_ERROR(cSize, "RLE compress block failed"); - DEBUGLOG(4, "Writing out RLE block, size: %zu", cSize); + DEBUGLOG(5, "Writing out RLE block, size: %zu", cSize); *dRep =3D dRepOriginal; /* reset simulated decompression repcode h= istory */ } else { ZSTD_blockState_confirmRepcodesAndEntropyTables(&zc->blockState); writeBlockHeader(op, cSeqsSize, srcSize, lastBlock); cSize =3D ZSTD_blockHeaderSize + cSeqsSize; - DEBUGLOG(4, "Writing out compressed block, size: %zu", cSize); + DEBUGLOG(5, "Writing out compressed block, size: %zu", cSize); } =20 if (zc->blockState.prevCBlock->entropy.fse.offcode_repeatMode =3D=3D F= SE_repeat_valid) @@ -3481,45 +4095,49 @@ typedef struct { =20 /* Helper function to perform the recursive search for block splits. * Estimates the cost of seqStore prior to split, and estimates the cost o= f splitting the sequences in half. - * If advantageous to split, then we recurse down the two sub-blocks. If n= ot, or if an error occurred in estimation, then - * we do not recurse. + * If advantageous to split, then we recurse down the two sub-blocks. + * If not, or if an error occurred in estimation, then we do not recurse. * - * Note: The recursion depth is capped by a heuristic minimum number of se= quences, defined by MIN_SEQUENCES_BLOCK_SPLITTING. + * Note: The recursion depth is capped by a heuristic minimum number of se= quences, + * defined by MIN_SEQUENCES_BLOCK_SPLITTING. * In theory, this means the absolute largest recursion depth is 10 =3D=3D= log2(maxNbSeqInBlock/MIN_SEQUENCES_BLOCK_SPLITTING). * In practice, recursion depth usually doesn't go beyond 4. * - * Furthermore, the number of splits is capped by ZSTD_MAX_NB_BLOCK_SPLITS= . At ZSTD_MAX_NB_BLOCK_SPLITS =3D=3D 196 with the current existing blockSize + * Furthermore, the number of splits is capped by ZSTD_MAX_NB_BLOCK_SPLITS. + * At ZSTD_MAX_NB_BLOCK_SPLITS =3D=3D 196 with the current existing blockS= ize * maximum of 128 KB, this value is actually impossible to reach. */ static void ZSTD_deriveBlockSplitsHelper(seqStoreSplits* splits, size_t startIdx, size= _t endIdx, - ZSTD_CCtx* zc, const seqStore_t* origSeqStore) + ZSTD_CCtx* zc, const SeqStore_t* origSeqStore) { - seqStore_t* fullSeqStoreChunk =3D &zc->blockSplitCtx.fullSeqStoreChunk; - seqStore_t* firstHalfSeqStore =3D &zc->blockSplitCtx.firstHalfSeqStore; - seqStore_t* secondHalfSeqStore =3D &zc->blockSplitCtx.secondHalfSeqSto= re; + SeqStore_t* const fullSeqStoreChunk =3D &zc->blockSplitCtx.fullSeqStor= eChunk; + SeqStore_t* const firstHalfSeqStore =3D &zc->blockSplitCtx.firstHalfSe= qStore; + SeqStore_t* const secondHalfSeqStore =3D &zc->blockSplitCtx.secondHalf= SeqStore; size_t estimatedOriginalSize; size_t estimatedFirstHalfSize; size_t estimatedSecondHalfSize; size_t midIdx =3D (startIdx + endIdx)/2; =20 + DEBUGLOG(5, "ZSTD_deriveBlockSplitsHelper: startIdx=3D%zu endIdx=3D%zu= ", startIdx, endIdx); + assert(endIdx >=3D startIdx); if (endIdx - startIdx < MIN_SEQUENCES_BLOCK_SPLITTING || splits->idx >= =3D ZSTD_MAX_NB_BLOCK_SPLITS) { - DEBUGLOG(6, "ZSTD_deriveBlockSplitsHelper: Too few sequences"); + DEBUGLOG(6, "ZSTD_deriveBlockSplitsHelper: Too few sequences (%zu)= ", endIdx - startIdx); return; } - DEBUGLOG(4, "ZSTD_deriveBlockSplitsHelper: startIdx=3D%zu endIdx=3D%zu= ", startIdx, endIdx); ZSTD_deriveSeqStoreChunk(fullSeqStoreChunk, origSeqStore, startIdx, en= dIdx); ZSTD_deriveSeqStoreChunk(firstHalfSeqStore, origSeqStore, startIdx, mi= dIdx); ZSTD_deriveSeqStoreChunk(secondHalfSeqStore, origSeqStore, midIdx, end= Idx); estimatedOriginalSize =3D ZSTD_buildEntropyStatisticsAndEstimateSubBlo= ckSize(fullSeqStoreChunk, zc); estimatedFirstHalfSize =3D ZSTD_buildEntropyStatisticsAndEstimateSubBl= ockSize(firstHalfSeqStore, zc); estimatedSecondHalfSize =3D ZSTD_buildEntropyStatisticsAndEstimateSubB= lockSize(secondHalfSeqStore, zc); - DEBUGLOG(4, "Estimated original block size: %zu -- First half split: %= zu -- Second half split: %zu", + DEBUGLOG(5, "Estimated original block size: %zu -- First half split: %= zu -- Second half split: %zu", estimatedOriginalSize, estimatedFirstHalfSize, estimatedSecon= dHalfSize); if (ZSTD_isError(estimatedOriginalSize) || ZSTD_isError(estimatedFirst= HalfSize) || ZSTD_isError(estimatedSecondHalfSize)) { return; } if (estimatedFirstHalfSize + estimatedSecondHalfSize < estimatedOrigin= alSize) { + DEBUGLOG(5, "split decided at seqNb:%zu", midIdx); ZSTD_deriveBlockSplitsHelper(splits, startIdx, midIdx, zc, origSeq= Store); splits->splitLocations[splits->idx] =3D (U32)midIdx; splits->idx++; @@ -3527,14 +4145,18 @@ ZSTD_deriveBlockSplitsHelper(seqStoreSplits* splits= , size_t startIdx, size_t end } } =20 -/* Base recursive function. Populates a table with intra-block partition i= ndices that can improve compression ratio. +/* Base recursive function. + * Populates a table with intra-block partition indices that can improve c= ompression ratio. * - * Returns the number of splits made (which equals the size of the partiti= on table - 1). + * @return: number of splits made (which equals the size of the partition = table - 1). */ -static size_t ZSTD_deriveBlockSplits(ZSTD_CCtx* zc, U32 partitions[], U32 = nbSeq) { - seqStoreSplits splits =3D {partitions, 0}; +static size_t ZSTD_deriveBlockSplits(ZSTD_CCtx* zc, U32 partitions[], U32 = nbSeq) +{ + seqStoreSplits splits; + splits.splitLocations =3D partitions; + splits.idx =3D 0; if (nbSeq <=3D 4) { - DEBUGLOG(4, "ZSTD_deriveBlockSplits: Too few sequences to split"); + DEBUGLOG(5, "ZSTD_deriveBlockSplits: Too few sequences to split (%= u <=3D 4)", nbSeq); /* Refuse to try and split anything with less than 4 sequences */ return 0; } @@ -3550,18 +4172,20 @@ static size_t ZSTD_deriveBlockSplits(ZSTD_CCtx* zc,= U32 partitions[], U32 nbSeq) * Returns combined size of all blocks (which includes headers), or a ZSTD= error code. */ static size_t -ZSTD_compressBlock_splitBlock_internal(ZSTD_CCtx* zc, void* dst, size_t ds= tCapacity, - const void* src, size_t blockSize, = U32 lastBlock, U32 nbSeq) +ZSTD_compressBlock_splitBlock_internal(ZSTD_CCtx* zc, + void* dst, size_t dstCapacity, + const void* src, size_t blockSize, + U32 lastBlock, U32 nbSeq) { size_t cSize =3D 0; const BYTE* ip =3D (const BYTE*)src; BYTE* op =3D (BYTE*)dst; size_t i =3D 0; size_t srcBytesTotal =3D 0; - U32* partitions =3D zc->blockSplitCtx.partitions; /* size =3D=3D ZSTD_= MAX_NB_BLOCK_SPLITS */ - seqStore_t* nextSeqStore =3D &zc->blockSplitCtx.nextSeqStore; - seqStore_t* currSeqStore =3D &zc->blockSplitCtx.currSeqStore; - size_t numSplits =3D ZSTD_deriveBlockSplits(zc, partitions, nbSeq); + U32* const partitions =3D zc->blockSplitCtx.partitions; /* size =3D=3D= ZSTD_MAX_NB_BLOCK_SPLITS */ + SeqStore_t* const nextSeqStore =3D &zc->blockSplitCtx.nextSeqStore; + SeqStore_t* const currSeqStore =3D &zc->blockSplitCtx.currSeqStore; + size_t const numSplits =3D ZSTD_deriveBlockSplits(zc, partitions, nbSe= q); =20 /* If a block is split and some partitions are emitted as RLE/uncompre= ssed, then repcode history * may become invalid. In order to reconcile potentially invalid repco= des, we keep track of two @@ -3577,36 +4201,37 @@ ZSTD_compressBlock_splitBlock_internal(ZSTD_CCtx* z= c, void* dst, size_t dstCapac * * See ZSTD_seqStore_resolveOffCodes() for more details. */ - repcodes_t dRep; - repcodes_t cRep; - ZSTD_memcpy(dRep.rep, zc->blockState.prevCBlock->rep, sizeof(repcodes_= t)); - ZSTD_memcpy(cRep.rep, zc->blockState.prevCBlock->rep, sizeof(repcodes_= t)); - ZSTD_memset(nextSeqStore, 0, sizeof(seqStore_t)); + Repcodes_t dRep; + Repcodes_t cRep; + ZSTD_memcpy(dRep.rep, zc->blockState.prevCBlock->rep, sizeof(Repcodes_= t)); + ZSTD_memcpy(cRep.rep, zc->blockState.prevCBlock->rep, sizeof(Repcodes_= t)); + ZSTD_memset(nextSeqStore, 0, sizeof(SeqStore_t)); =20 - DEBUGLOG(4, "ZSTD_compressBlock_splitBlock_internal (dstCapacity=3D%u,= dictLimit=3D%u, nextToUpdate=3D%u)", + DEBUGLOG(5, "ZSTD_compressBlock_splitBlock_internal (dstCapacity=3D%u,= dictLimit=3D%u, nextToUpdate=3D%u)", (unsigned)dstCapacity, (unsigned)zc->blockState.matchState= .window.dictLimit, (unsigned)zc->blockState.matchState.nextToUpdate); =20 if (numSplits =3D=3D 0) { - size_t cSizeSingleBlock =3D ZSTD_compressSeqStore_singleBlock(zc, = &zc->seqStore, - &dRep, = &cRep, - op, ds= tCapacity, - ip, bl= ockSize, - lastBl= ock, 0 /* isPartition */); + size_t cSizeSingleBlock =3D + ZSTD_compressSeqStore_singleBlock(zc, &zc->seqStore, + &dRep, &cRep, + op, dstCapacity, + ip, blockSize, + lastBlock, 0 /* isPartition */= ); FORWARD_IF_ERROR(cSizeSingleBlock, "Compressing single block from = splitBlock_internal() failed!"); DEBUGLOG(5, "ZSTD_compressBlock_splitBlock_internal: No splits"); - assert(cSizeSingleBlock <=3D ZSTD_BLOCKSIZE_MAX + ZSTD_blockHeader= Size); + assert(zc->blockSizeMax <=3D ZSTD_BLOCKSIZE_MAX); + assert(cSizeSingleBlock <=3D zc->blockSizeMax + ZSTD_blockHeaderSi= ze); return cSizeSingleBlock; } =20 ZSTD_deriveSeqStoreChunk(currSeqStore, &zc->seqStore, 0, partitions[0]= ); for (i =3D 0; i <=3D numSplits; ++i) { - size_t srcBytes; size_t cSizeChunk; U32 const lastPartition =3D (i =3D=3D numSplits); U32 lastBlockEntireSrc =3D 0; =20 - srcBytes =3D ZSTD_countSeqStoreLiteralsBytes(currSeqStore) + ZSTD_= countSeqStoreMatchBytes(currSeqStore); + size_t srcBytes =3D ZSTD_countSeqStoreLiteralsBytes(currSeqStore) = + ZSTD_countSeqStoreMatchBytes(currSeqStore); srcBytesTotal +=3D srcBytes; if (lastPartition) { /* This is the final partition, need to account for possible l= ast literals */ @@ -3621,7 +4246,8 @@ ZSTD_compressBlock_splitBlock_internal(ZSTD_CCtx* zc,= void* dst, size_t dstCapac op, dstCapacity, ip, srcBytes, lastBlockEntireSrc,= 1 /* isPartition */); - DEBUGLOG(5, "Estimated size: %zu actual size: %zu", ZSTD_buildEntr= opyStatisticsAndEstimateSubBlockSize(currSeqStore, zc), cSizeChunk); + DEBUGLOG(5, "Estimated size: %zu vs %zu : actual size", + ZSTD_buildEntropyStatisticsAndEstimateSubBlockSize(cur= rSeqStore, zc), cSizeChunk); FORWARD_IF_ERROR(cSizeChunk, "Compressing chunk failed!"); =20 ip +=3D srcBytes; @@ -3629,12 +4255,12 @@ ZSTD_compressBlock_splitBlock_internal(ZSTD_CCtx* z= c, void* dst, size_t dstCapac dstCapacity -=3D cSizeChunk; cSize +=3D cSizeChunk; *currSeqStore =3D *nextSeqStore; - assert(cSizeChunk <=3D ZSTD_BLOCKSIZE_MAX + ZSTD_blockHeaderSize); + assert(cSizeChunk <=3D zc->blockSizeMax + ZSTD_blockHeaderSize); } - /* cRep and dRep may have diverged during the compression. If so, we u= se the dRep repcodes - * for the next block. + /* cRep and dRep may have diverged during the compression. + * If so, we use the dRep repcodes for the next block. */ - ZSTD_memcpy(zc->blockState.prevCBlock->rep, dRep.rep, sizeof(repcodes_= t)); + ZSTD_memcpy(zc->blockState.prevCBlock->rep, dRep.rep, sizeof(Repcodes_= t)); return cSize; } =20 @@ -3643,21 +4269,20 @@ ZSTD_compressBlock_splitBlock(ZSTD_CCtx* zc, void* dst, size_t dstCapacity, const void* src, size_t srcSize, U32 lastBlo= ck) { - const BYTE* ip =3D (const BYTE*)src; - BYTE* op =3D (BYTE*)dst; U32 nbSeq; size_t cSize; - DEBUGLOG(4, "ZSTD_compressBlock_splitBlock"); - assert(zc->appliedParams.useBlockSplitter =3D=3D ZSTD_ps_enable); + DEBUGLOG(5, "ZSTD_compressBlock_splitBlock"); + assert(zc->appliedParams.postBlockSplitter =3D=3D ZSTD_ps_enable); =20 { const size_t bss =3D ZSTD_buildSeqStore(zc, src, srcSize); FORWARD_IF_ERROR(bss, "ZSTD_buildSeqStore failed"); if (bss =3D=3D ZSTDbss_noCompress) { if (zc->blockState.prevCBlock->entropy.fse.offcode_repeatMode = =3D=3D FSE_repeat_valid) zc->blockState.prevCBlock->entropy.fse.offcode_repeatMode = =3D FSE_repeat_check; - cSize =3D ZSTD_noCompressBlock(op, dstCapacity, ip, srcSize, l= astBlock); + RETURN_ERROR_IF(zc->seqCollector.collectSequences, sequencePro= ducer_failed, "Uncompressible block"); + cSize =3D ZSTD_noCompressBlock(dst, dstCapacity, src, srcSize,= lastBlock); FORWARD_IF_ERROR(cSize, "ZSTD_noCompressBlock failed"); - DEBUGLOG(4, "ZSTD_compressBlock_splitBlock: Nocompress block"); + DEBUGLOG(5, "ZSTD_compressBlock_splitBlock: Nocompress block"); return cSize; } nbSeq =3D (U32)(zc->seqStore.sequences - zc->seqStore.sequencesSta= rt); @@ -3673,9 +4298,9 @@ ZSTD_compressBlock_internal(ZSTD_CCtx* zc, void* dst, size_t dstCapacity, const void* src, size_t srcSize, U32 frame) { - /* This the upper bound for the length of an rle block. - * This isn't the actual upper bound. Finding the real threshold - * needs further investigation. + /* This is an estimated upper bound for the length of an rle block. + * This isn't the actual upper bound. + * Finding the real threshold needs further investigation. */ const U32 rleMaxLength =3D 25; size_t cSize; @@ -3687,11 +4312,15 @@ ZSTD_compressBlock_internal(ZSTD_CCtx* zc, =20 { const size_t bss =3D ZSTD_buildSeqStore(zc, src, srcSize); FORWARD_IF_ERROR(bss, "ZSTD_buildSeqStore failed"); - if (bss =3D=3D ZSTDbss_noCompress) { cSize =3D 0; goto out; } + if (bss =3D=3D ZSTDbss_noCompress) { + RETURN_ERROR_IF(zc->seqCollector.collectSequences, sequencePro= ducer_failed, "Uncompressible block"); + cSize =3D 0; + goto out; + } } =20 if (zc->seqCollector.collectSequences) { - ZSTD_copyBlockSequences(zc); + FORWARD_IF_ERROR(ZSTD_copyBlockSequences(&zc->seqCollector, ZSTD_g= etSeqStore(zc), zc->blockState.prevCBlock->rep), "copyBlockSequences failed= "); ZSTD_blockState_confirmRepcodesAndEntropyTables(&zc->blockState); return 0; } @@ -3702,7 +4331,7 @@ ZSTD_compressBlock_internal(ZSTD_CCtx* zc, &zc->appliedParams, dst, dstCapacity, srcSize, - zc->entropyWorkspace, ENTROPY_WORKSPACE_SIZE /* statically all= ocated in resetCCtx */, + zc->tmpWorkspace, zc->tmpWkspSize /* statically allocated in r= esetCCtx */, zc->bmi2); =20 if (frame && @@ -3767,10 +4396,11 @@ static size_t ZSTD_compressBlock_targetCBlockSize_b= ody(ZSTD_CCtx* zc, * * cSize >=3D blockBound(srcSize): We have expanded the block = too much so * emit an uncompressed block. */ - { - size_t const cSize =3D ZSTD_compressSuperBlock(zc, dst, dstCap= acity, src, srcSize, lastBlock); + { size_t const cSize =3D + ZSTD_compressSuperBlock(zc, dst, dstCapacity, src, srcSize= , lastBlock); if (cSize !=3D ERROR(dstSize_tooSmall)) { - size_t const maxCSize =3D srcSize - ZSTD_minGain(srcSize, = zc->appliedParams.cParams.strategy); + size_t const maxCSize =3D + srcSize - ZSTD_minGain(srcSize, zc->appliedParams.cPar= ams.strategy); FORWARD_IF_ERROR(cSize, "ZSTD_compressSuperBlock failed"); if (cSize !=3D 0 && cSize < maxCSize + ZSTD_blockHeaderSiz= e) { ZSTD_blockState_confirmRepcodesAndEntropyTables(&zc->b= lockState); @@ -3778,7 +4408,7 @@ static size_t ZSTD_compressBlock_targetCBlockSize_bod= y(ZSTD_CCtx* zc, } } } - } + } /* if (bss =3D=3D ZSTDbss_compress)*/ =20 DEBUGLOG(6, "Resorting to ZSTD_noCompressBlock()"); /* Superblock compression failed, attempt to emit a single no compress= block. @@ -3807,7 +4437,7 @@ static size_t ZSTD_compressBlock_targetCBlockSize(ZST= D_CCtx* zc, return cSize; } =20 -static void ZSTD_overflowCorrectIfNeeded(ZSTD_matchState_t* ms, +static void ZSTD_overflowCorrectIfNeeded(ZSTD_MatchState_t* ms, ZSTD_cwksp* ws, ZSTD_CCtx_params const* params, void const* ip, @@ -3831,39 +4461,82 @@ static void ZSTD_overflowCorrectIfNeeded(ZSTD_match= State_t* ms, } } =20 +#include "zstd_preSplit.h" + +static size_t ZSTD_optimalBlockSize(ZSTD_CCtx* cctx, const void* src, size= _t srcSize, size_t blockSizeMax, int splitLevel, ZSTD_strategy strat, S64 s= avings) +{ + /* split level based on compression strategy, from `fast` to `btultra2= ` */ + static const int splitLevels[] =3D { 0, 0, 1, 2, 2, 3, 3, 4, 4, 4 }; + /* note: conservatively only split full blocks (128 KB) currently. + * While it's possible to go lower, let's keep it simple for a first i= mplementation. + * Besides, benefits of splitting are reduced when blocks are already = small. + */ + if (srcSize < 128 KB || blockSizeMax < 128 KB) + return MIN(srcSize, blockSizeMax); + /* do not split incompressible data though: + * require verified savings to allow pre-splitting. + * Note: as a consequence, the first full block is not split. + */ + if (savings < 3) { + DEBUGLOG(6, "don't attempt splitting: savings (%i) too low", (int)= savings); + return 128 KB; + } + /* apply @splitLevel, or use default value (which depends on @strat). + * note that splitting heuristic is still conditioned by @savings >=3D= 3, + * so the first block will not reach this code path */ + if (splitLevel =3D=3D 1) return 128 KB; + if (splitLevel =3D=3D 0) { + assert(ZSTD_fast <=3D strat && strat <=3D ZSTD_btultra2); + splitLevel =3D splitLevels[strat]; + } else { + assert(2 <=3D splitLevel && splitLevel <=3D 6); + splitLevel -=3D 2; + } + return ZSTD_splitBlock(src, blockSizeMax, splitLevel, cctx->tmpWorkspa= ce, cctx->tmpWkspSize); +} + /*! ZSTD_compress_frameChunk() : * Compress a chunk of data into one or multiple blocks. * All blocks will be terminated, all input will be consumed. * Function will issue an error if there is not enough `dstCapacity` to h= old the compressed content. * Frame is supposed already started (header already produced) -* @return : compressed size, or an error code +* @return : compressed size, or an error code */ static size_t ZSTD_compress_frameChunk(ZSTD_CCtx* cctx, void* dst, size_t dstCapacity, const void* src, size_t srcSize, U32 lastFrameChunk) { - size_t blockSize =3D cctx->blockSize; + size_t blockSizeMax =3D cctx->blockSizeMax; size_t remaining =3D srcSize; const BYTE* ip =3D (const BYTE*)src; BYTE* const ostart =3D (BYTE*)dst; BYTE* op =3D ostart; U32 const maxDist =3D (U32)1 << cctx->appliedParams.cParams.windowLog; + S64 savings =3D (S64)cctx->consumedSrcSize - (S64)cctx->producedCSize; =20 assert(cctx->appliedParams.cParams.windowLog <=3D ZSTD_WINDOWLOG_MAX); =20 - DEBUGLOG(4, "ZSTD_compress_frameChunk (blockSize=3D%u)", (unsigned)blo= ckSize); + DEBUGLOG(5, "ZSTD_compress_frameChunk (srcSize=3D%u, blockSizeMax=3D%u= )", (unsigned)srcSize, (unsigned)blockSizeMax); if (cctx->appliedParams.fParams.checksumFlag && srcSize) xxh64_update(&cctx->xxhState, src, srcSize); =20 while (remaining) { - ZSTD_matchState_t* const ms =3D &cctx->blockState.matchState; - U32 const lastBlock =3D lastFrameChunk & (blockSize >=3D remaining= ); - - RETURN_ERROR_IF(dstCapacity < ZSTD_blockHeaderSize + MIN_CBLOCK_SI= ZE, + ZSTD_MatchState_t* const ms =3D &cctx->blockState.matchState; + size_t const blockSize =3D ZSTD_optimalBlockSize(cctx, + ip, remaining, + blockSizeMax, + cctx->appliedParams.preBlockSplitter_level, + cctx->appliedParams.cParams.strategy, + savings); + U32 const lastBlock =3D lastFrameChunk & (blockSize =3D=3D remaini= ng); + assert(blockSize <=3D remaining); + + /* TODO: See 3090. We reduced MIN_CBLOCK_SIZE from 3 to 2 so to co= mpensate we are adding + * additional 1. We need to revisit and change this logic to be mo= re consistent */ + RETURN_ERROR_IF(dstCapacity < ZSTD_blockHeaderSize + MIN_CBLOCK_SI= ZE + 1, dstSize_tooSmall, "not enough space to store compressed block"); - if (remaining < blockSize) blockSize =3D remaining; =20 ZSTD_overflowCorrectIfNeeded( ms, &cctx->workspace, &cctx->appliedParams, ip, ip + blockSize= ); @@ -3899,8 +4572,23 @@ static size_t ZSTD_compress_frameChunk(ZSTD_CCtx* cc= tx, MEM_writeLE24(op, cBlockHeader); cSize +=3D ZSTD_blockHeaderSize; } - } - + } /* if (ZSTD_useTargetCBlockSize(&cctx->appliedParams))*/ + + /* @savings is employed to ensure that splitting doesn't worse= n expansion of incompressible data. + * Without splitting, the maximum expansion is 3 bytes per ful= l block. + * An adversarial input could attempt to fudge the split detec= tor, + * and make it split incompressible data, resulting in more bl= ock headers. + * Note that, since ZSTD_COMPRESSBOUND() assumes a worst case = scenario of 1KB per block, + * and the splitter never creates blocks that small (current l= ower limit is 8 KB), + * there is already no risk to expand beyond ZSTD_COMPRESSBOUN= D() limit. + * But if the goal is to not expand by more than 3-bytes per 1= 28 KB full block, + * then yes, it becomes possible to make the block splitter ov= ersplit incompressible data. + * Using @savings, we enforce an even more conservative condit= ion, + * requiring the presence of enough savings (at least 3 bytes)= to authorize splitting, + * otherwise only full blocks are used. + * But being conservative is fine, + * since splitting barely compressible blocks is not fruitful = anyway */ + savings +=3D (S64)blockSize - (S64)cSize; =20 ip +=3D blockSize; assert(remaining >=3D blockSize); @@ -3919,8 +4607,10 @@ static size_t ZSTD_compress_frameChunk(ZSTD_CCtx* cc= tx, =20 =20 static size_t ZSTD_writeFrameHeader(void* dst, size_t dstCapacity, - const ZSTD_CCtx_params* params, U64 pl= edgedSrcSize, U32 dictID) -{ BYTE* const op =3D (BYTE*)dst; + const ZSTD_CCtx_params* params, + U64 pledgedSrcSize, U32 dictID) +{ + BYTE* const op =3D (BYTE*)dst; U32 const dictIDSizeCodeLength =3D (dictID>0) + (dictID>=3D256) + (d= ictID>=3D65536); /* 0-3 */ U32 const dictIDSizeCode =3D params->fParams.noDictIDFlag ? 0 : dict= IDSizeCodeLength; /* 0-3 */ U32 const checksumFlag =3D params->fParams.checksumFlag>0; @@ -4001,19 +4691,15 @@ size_t ZSTD_writeLastEmptyBlock(void* dst, size_t d= stCapacity) } } =20 -size_t ZSTD_referenceExternalSequences(ZSTD_CCtx* cctx, rawSeq* seq, size_= t nbSeq) +void ZSTD_referenceExternalSequences(ZSTD_CCtx* cctx, rawSeq* seq, size_t = nbSeq) { - RETURN_ERROR_IF(cctx->stage !=3D ZSTDcs_init, stage_wrong, - "wrong cctx stage"); - RETURN_ERROR_IF(cctx->appliedParams.ldmParams.enableLdm =3D=3D ZSTD_ps= _enable, - parameter_unsupported, - "incompatible with ldm"); + assert(cctx->stage =3D=3D ZSTDcs_init); + assert(nbSeq =3D=3D 0 || cctx->appliedParams.ldmParams.enableLdm !=3D = ZSTD_ps_enable); cctx->externSeqStore.seq =3D seq; cctx->externSeqStore.size =3D nbSeq; cctx->externSeqStore.capacity =3D nbSeq; cctx->externSeqStore.pos =3D 0; cctx->externSeqStore.posInSequence =3D 0; - return 0; } =20 =20 @@ -4022,7 +4708,7 @@ static size_t ZSTD_compressContinue_internal (ZSTD_CC= tx* cctx, const void* src, size_t srcSize, U32 frame, U32 lastFrameChunk) { - ZSTD_matchState_t* const ms =3D &cctx->blockState.matchState; + ZSTD_MatchState_t* const ms =3D &cctx->blockState.matchState; size_t fhSize =3D 0; =20 DEBUGLOG(5, "ZSTD_compressContinue_internal, stage: %u, srcSize: %u", @@ -4057,7 +4743,7 @@ static size_t ZSTD_compressContinue_internal (ZSTD_CC= tx* cctx, src, (BYTE const*)src + srcSize); } =20 - DEBUGLOG(5, "ZSTD_compressContinue_internal (blockSize=3D%u)", (unsign= ed)cctx->blockSize); + DEBUGLOG(5, "ZSTD_compressContinue_internal (blockSize=3D%u)", (unsign= ed)cctx->blockSizeMax); { size_t const cSize =3D frame ? ZSTD_compress_frameChunk (cctx, dst, dstCapac= ity, src, srcSize, lastFrameChunk) : ZSTD_compressBlock_internal (cctx, dst, dstCa= pacity, src, srcSize, 0 /* frame */); @@ -4078,58 +4764,90 @@ static size_t ZSTD_compressContinue_internal (ZSTD_= CCtx* cctx, } } =20 -size_t ZSTD_compressContinue (ZSTD_CCtx* cctx, - void* dst, size_t dstCapacity, - const void* src, size_t srcSize) +size_t ZSTD_compressContinue_public(ZSTD_CCtx* cctx, + void* dst, size_t dstCapacity, + const void* src, size_t srcSize) { DEBUGLOG(5, "ZSTD_compressContinue (srcSize=3D%u)", (unsigned)srcSize); return ZSTD_compressContinue_internal(cctx, dst, dstCapacity, src, src= Size, 1 /* frame mode */, 0 /* last chunk */); } =20 +/* NOTE: Must just wrap ZSTD_compressContinue_public() */ +size_t ZSTD_compressContinue(ZSTD_CCtx* cctx, + void* dst, size_t dstCapacity, + const void* src, size_t srcSize) +{ + return ZSTD_compressContinue_public(cctx, dst, dstCapacity, src, srcSi= ze); +} =20 -size_t ZSTD_getBlockSize(const ZSTD_CCtx* cctx) +static size_t ZSTD_getBlockSize_deprecated(const ZSTD_CCtx* cctx) { ZSTD_compressionParameters const cParams =3D cctx->appliedParams.cPara= ms; assert(!ZSTD_checkCParams(cParams)); - return MIN (ZSTD_BLOCKSIZE_MAX, (U32)1 << cParams.windowLog); + return MIN(cctx->appliedParams.maxBlockSize, (size_t)1 << cParams.wind= owLog); } =20 -size_t ZSTD_compressBlock(ZSTD_CCtx* cctx, void* dst, size_t dstCapacity, = const void* src, size_t srcSize) +/* NOTE: Must just wrap ZSTD_getBlockSize_deprecated() */ +size_t ZSTD_getBlockSize(const ZSTD_CCtx* cctx) +{ + return ZSTD_getBlockSize_deprecated(cctx); +} + +/* NOTE: Must just wrap ZSTD_compressBlock_deprecated() */ +size_t ZSTD_compressBlock_deprecated(ZSTD_CCtx* cctx, void* dst, size_t ds= tCapacity, const void* src, size_t srcSize) { DEBUGLOG(5, "ZSTD_compressBlock: srcSize =3D %u", (unsigned)srcSize); - { size_t const blockSizeMax =3D ZSTD_getBlockSize(cctx); + { size_t const blockSizeMax =3D ZSTD_getBlockSize_deprecated(cctx); RETURN_ERROR_IF(srcSize > blockSizeMax, srcSize_wrong, "input is lar= ger than a block"); } =20 return ZSTD_compressContinue_internal(cctx, dst, dstCapacity, src, src= Size, 0 /* frame mode */, 0 /* last chunk */); } =20 +/* NOTE: Must just wrap ZSTD_compressBlock_deprecated() */ +size_t ZSTD_compressBlock(ZSTD_CCtx* cctx, void* dst, size_t dstCapacity, = const void* src, size_t srcSize) +{ + return ZSTD_compressBlock_deprecated(cctx, dst, dstCapacity, src, srcS= ize); +} + /*! ZSTD_loadDictionaryContent() : * @return : 0, or an error code */ -static size_t ZSTD_loadDictionaryContent(ZSTD_matchState_t* ms, - ldmState_t* ls, - ZSTD_cwksp* ws, - ZSTD_CCtx_params const* params, - const void* src, size_t srcSize, - ZSTD_dictTableLoadMethod_e dtlm) +static size_t +ZSTD_loadDictionaryContent(ZSTD_MatchState_t* ms, + ldmState_t* ls, + ZSTD_cwksp* ws, + ZSTD_CCtx_params const* params, + const void* src, size_t srcSize, + ZSTD_dictTableLoadMethod_e dtlm, + ZSTD_tableFillPurpose_e tfp) { const BYTE* ip =3D (const BYTE*) src; const BYTE* const iend =3D ip + srcSize; int const loadLdmDict =3D params->ldmParams.enableLdm =3D=3D ZSTD_ps_e= nable && ls !=3D NULL; =20 - /* Assert that we the ms params match the params we're being given */ + /* Assert that the ms params match the params we're being given */ ZSTD_assertEqualCParams(params->cParams, ms->cParams); =20 - if (srcSize > ZSTD_CHUNKSIZE_MAX) { + { /* Ensure large dictionaries can't cause index overflow */ + /* Allow the dictionary to set indices up to exactly ZSTD_CURRENT_= MAX. * Dictionaries right at the edge will immediately trigger overflow * correction, but I don't want to insert extra constraints here. */ - U32 const maxDictSize =3D ZSTD_CURRENT_MAX - 1; - /* We must have cleared our windows when our source is this large.= */ - assert(ZSTD_window_isEmpty(ms->window)); - if (loadLdmDict) - assert(ZSTD_window_isEmpty(ls->window)); + U32 maxDictSize =3D ZSTD_CURRENT_MAX - ZSTD_WINDOW_START_INDEX; + + int const CDictTaggedIndices =3D ZSTD_CDictIndicesAreTagged(¶m= s->cParams); + if (CDictTaggedIndices && tfp =3D=3D ZSTD_tfp_forCDict) { + /* Some dictionary matchfinders in zstd use "short cache", + * which treats the lower ZSTD_SHORT_CACHE_TAG_BITS of each + * CDict hashtable entry as a tag rather than as part of an in= dex. + * When short cache is used, we need to truncate the dictionary + * so that its indices don't overlap with the tag. */ + U32 const shortCacheMaxDictSize =3D (1u << (32 - ZSTD_SHORT_CA= CHE_TAG_BITS)) - ZSTD_WINDOW_START_INDEX; + maxDictSize =3D MIN(maxDictSize, shortCacheMaxDictSize); + assert(!loadLdmDict); + } + /* If the dictionary is too large, only load the suffix of the dic= tionary. */ if (srcSize > maxDictSize) { ip =3D iend - maxDictSize; @@ -4138,35 +4856,59 @@ static size_t ZSTD_loadDictionaryContent(ZSTD_match= State_t* ms, } } =20 - DEBUGLOG(4, "ZSTD_loadDictionaryContent(): useRowMatchFinder=3D%d", (i= nt)params->useRowMatchFinder); + if (srcSize > ZSTD_CHUNKSIZE_MAX) { + /* We must have cleared our windows when our source is this large.= */ + assert(ZSTD_window_isEmpty(ms->window)); + if (loadLdmDict) assert(ZSTD_window_isEmpty(ls->window)); + } ZSTD_window_update(&ms->window, src, srcSize, /* forceNonContiguous */= 0); - ms->loadedDictEnd =3D params->forceWindow ? 0 : (U32)(iend - ms->windo= w.base); - ms->forceNonContiguous =3D params->deterministicRefPrefix; =20 - if (loadLdmDict) { + DEBUGLOG(4, "ZSTD_loadDictionaryContent: useRowMatchFinder=3D%d", (int= )params->useRowMatchFinder); + + if (loadLdmDict) { /* Load the entire dict into LDM matchfinders. */ + DEBUGLOG(4, "ZSTD_loadDictionaryContent: Trigger loadLdmDict"); ZSTD_window_update(&ls->window, src, srcSize, /* forceNonContiguou= s */ 0); ls->loadedDictEnd =3D params->forceWindow ? 0 : (U32)(iend - ls->w= indow.base); + ZSTD_ldm_fillHashTable(ls, ip, iend, ¶ms->ldmParams); + DEBUGLOG(4, "ZSTD_loadDictionaryContent: ZSTD_ldm_fillHashTable co= mpletes"); + } + + /* If the dict is larger than we can reasonably index in our tables, o= nly load the suffix. */ + { U32 maxDictSize =3D 1U << MIN(MAX(params->cParams.hashLog + 3, par= ams->cParams.chainLog + 1), 31); + if (srcSize > maxDictSize) { + ip =3D iend - maxDictSize; + src =3D ip; + srcSize =3D maxDictSize; + } } =20 + ms->nextToUpdate =3D (U32)(ip - ms->window.base); + ms->loadedDictEnd =3D params->forceWindow ? 0 : (U32)(iend - ms->windo= w.base); + ms->forceNonContiguous =3D params->deterministicRefPrefix; + if (srcSize <=3D HASH_READ_SIZE) return 0; =20 ZSTD_overflowCorrectIfNeeded(ms, ws, params, ip, iend); =20 - if (loadLdmDict) - ZSTD_ldm_fillHashTable(ls, ip, iend, ¶ms->ldmParams); - switch(params->cParams.strategy) { case ZSTD_fast: - ZSTD_fillHashTable(ms, iend, dtlm); + ZSTD_fillHashTable(ms, iend, dtlm, tfp); break; case ZSTD_dfast: - ZSTD_fillDoubleHashTable(ms, iend, dtlm); +#ifndef ZSTD_EXCLUDE_DFAST_BLOCK_COMPRESSOR + ZSTD_fillDoubleHashTable(ms, iend, dtlm, tfp); +#else + assert(0); /* shouldn't be called: cparams should've been adjusted= . */ +#endif break; =20 case ZSTD_greedy: case ZSTD_lazy: case ZSTD_lazy2: +#if !defined(ZSTD_EXCLUDE_GREEDY_BLOCK_COMPRESSOR) \ + || !defined(ZSTD_EXCLUDE_LAZY_BLOCK_COMPRESSOR) \ + || !defined(ZSTD_EXCLUDE_LAZY2_BLOCK_COMPRESSOR) assert(srcSize >=3D HASH_READ_SIZE); if (ms->dedicatedDictSearch) { assert(ms->chainTable !=3D NULL); @@ -4174,7 +4916,7 @@ static size_t ZSTD_loadDictionaryContent(ZSTD_matchSt= ate_t* ms, } else { assert(params->useRowMatchFinder !=3D ZSTD_ps_auto); if (params->useRowMatchFinder =3D=3D ZSTD_ps_enable) { - size_t const tagTableSize =3D ((size_t)1 << params->cParam= s.hashLog) * sizeof(U16); + size_t const tagTableSize =3D ((size_t)1 << params->cParam= s.hashLog); ZSTD_memset(ms->tagTable, 0, tagTableSize); ZSTD_row_update(ms, iend-HASH_READ_SIZE); DEBUGLOG(4, "Using row-based hash table for lazy dict"); @@ -4183,14 +4925,24 @@ static size_t ZSTD_loadDictionaryContent(ZSTD_match= State_t* ms, DEBUGLOG(4, "Using chain-based hash table for lazy dict"); } } +#else + assert(0); /* shouldn't be called: cparams should've been adjusted= . */ +#endif break; =20 case ZSTD_btlazy2: /* we want the dictionary table fully sorted */ case ZSTD_btopt: case ZSTD_btultra: case ZSTD_btultra2: +#if !defined(ZSTD_EXCLUDE_BTLAZY2_BLOCK_COMPRESSOR) \ + || !defined(ZSTD_EXCLUDE_BTOPT_BLOCK_COMPRESSOR) \ + || !defined(ZSTD_EXCLUDE_BTULTRA_BLOCK_COMPRESSOR) assert(srcSize >=3D HASH_READ_SIZE); + DEBUGLOG(4, "Fill %u bytes into the Binary Tree", (unsigned)srcSiz= e); ZSTD_updateTree(ms, iend-HASH_READ_SIZE, iend); +#else + assert(0); /* shouldn't be called: cparams should've been adjusted= . */ +#endif break; =20 default: @@ -4233,20 +4985,19 @@ size_t ZSTD_loadCEntropy(ZSTD_compressedBlockState_= t* bs, void* workspace, { unsigned maxSymbolValue =3D 255; unsigned hasZeroWeights =3D 1; size_t const hufHeaderSize =3D HUF_readCTable((HUF_CElt*)bs->entro= py.huf.CTable, &maxSymbolValue, dictPtr, - dictEnd-dictPtr, &hasZeroWeights); + (size_t)(dictEnd-dictPtr), &hasZeroWeights); =20 /* We only set the loaded table as valid if it contains all non-ze= ro * weights. Otherwise, we set it to check */ - if (!hasZeroWeights) + if (!hasZeroWeights && maxSymbolValue =3D=3D 255) bs->entropy.huf.repeatMode =3D HUF_repeat_valid; =20 RETURN_ERROR_IF(HUF_isError(hufHeaderSize), dictionary_corrupted, = ""); - RETURN_ERROR_IF(maxSymbolValue < 255, dictionary_corrupted, ""); dictPtr +=3D hufHeaderSize; } =20 { unsigned offcodeLog; - size_t const offcodeHeaderSize =3D FSE_readNCount(offcodeNCount, &= offcodeMaxValue, &offcodeLog, dictPtr, dictEnd-dictPtr); + size_t const offcodeHeaderSize =3D FSE_readNCount(offcodeNCount, &= offcodeMaxValue, &offcodeLog, dictPtr, (size_t)(dictEnd-dictPtr)); RETURN_ERROR_IF(FSE_isError(offcodeHeaderSize), dictionary_corrupt= ed, ""); RETURN_ERROR_IF(offcodeLog > OffFSELog, dictionary_corrupted, ""); /* fill all offset symbols to avoid garbage at end of table */ @@ -4261,7 +5012,7 @@ size_t ZSTD_loadCEntropy(ZSTD_compressedBlockState_t*= bs, void* workspace, =20 { short matchlengthNCount[MaxML+1]; unsigned matchlengthMaxValue =3D MaxML, matchlengthLog; - size_t const matchlengthHeaderSize =3D FSE_readNCount(matchlengthN= Count, &matchlengthMaxValue, &matchlengthLog, dictPtr, dictEnd-dictPtr); + size_t const matchlengthHeaderSize =3D FSE_readNCount(matchlengthN= Count, &matchlengthMaxValue, &matchlengthLog, dictPtr, (size_t)(dictEnd-dic= tPtr)); RETURN_ERROR_IF(FSE_isError(matchlengthHeaderSize), dictionary_cor= rupted, ""); RETURN_ERROR_IF(matchlengthLog > MLFSELog, dictionary_corrupted, "= "); RETURN_ERROR_IF(FSE_isError(FSE_buildCTable_wksp( @@ -4275,7 +5026,7 @@ size_t ZSTD_loadCEntropy(ZSTD_compressedBlockState_t*= bs, void* workspace, =20 { short litlengthNCount[MaxLL+1]; unsigned litlengthMaxValue =3D MaxLL, litlengthLog; - size_t const litlengthHeaderSize =3D FSE_readNCount(litlengthNCoun= t, &litlengthMaxValue, &litlengthLog, dictPtr, dictEnd-dictPtr); + size_t const litlengthHeaderSize =3D FSE_readNCount(litlengthNCoun= t, &litlengthMaxValue, &litlengthLog, dictPtr, (size_t)(dictEnd-dictPtr)); RETURN_ERROR_IF(FSE_isError(litlengthHeaderSize), dictionary_corru= pted, ""); RETURN_ERROR_IF(litlengthLog > LLFSELog, dictionary_corrupted, ""); RETURN_ERROR_IF(FSE_isError(FSE_buildCTable_wksp( @@ -4309,7 +5060,7 @@ size_t ZSTD_loadCEntropy(ZSTD_compressedBlockState_t*= bs, void* workspace, RETURN_ERROR_IF(bs->rep[u] > dictContentSize, dictionary_c= orrupted, ""); } } } =20 - return dictPtr - (const BYTE*)dict; + return (size_t)(dictPtr - (const BYTE*)dict); } =20 /* Dictionary format : @@ -4322,11 +5073,12 @@ size_t ZSTD_loadCEntropy(ZSTD_compressedBlockState_= t* bs, void* workspace, * dictSize supposed >=3D 8 */ static size_t ZSTD_loadZstdDictionary(ZSTD_compressedBlockState_t* bs, - ZSTD_matchState_t* ms, + ZSTD_MatchState_t* ms, ZSTD_cwksp* ws, ZSTD_CCtx_params const* params, const void* dict, size_t dictSize, ZSTD_dictTableLoadMethod_e dtlm, + ZSTD_tableFillPurpose_e tfp, void* workspace) { const BYTE* dictPtr =3D (const BYTE*)dict; @@ -4345,7 +5097,7 @@ static size_t ZSTD_loadZstdDictionary(ZSTD_compressed= BlockState_t* bs, { size_t const dictContentSize =3D (size_t)(dictEnd - dictPtr); FORWARD_IF_ERROR(ZSTD_loadDictionaryContent( - ms, NULL, ws, params, dictPtr, dictContentSize, dtlm), ""); + ms, NULL, ws, params, dictPtr, dictContentSize, dtlm, tfp), ""= ); } return dictID; } @@ -4354,13 +5106,14 @@ static size_t ZSTD_loadZstdDictionary(ZSTD_compress= edBlockState_t* bs, * @return : dictID, or an error code */ static size_t ZSTD_compress_insertDictionary(ZSTD_compressedBlockState_t* bs, - ZSTD_matchState_t* ms, + ZSTD_MatchState_t* ms, ldmState_t* ls, ZSTD_cwksp* ws, const ZSTD_CCtx_params* params, const void* dict, size_t dictSize, ZSTD_dictContentType_e dictContentType, ZSTD_dictTableLoadMethod_e dtlm, + ZSTD_tableFillPurpose_e tfp, void* workspace) { DEBUGLOG(4, "ZSTD_compress_insertDictionary (dictSize=3D%u)", (U32)dic= tSize); @@ -4373,13 +5126,13 @@ ZSTD_compress_insertDictionary(ZSTD_compressedBlock= State_t* bs, =20 /* dict restricted modes */ if (dictContentType =3D=3D ZSTD_dct_rawContent) - return ZSTD_loadDictionaryContent(ms, ls, ws, params, dict, dictSi= ze, dtlm); + return ZSTD_loadDictionaryContent(ms, ls, ws, params, dict, dictSi= ze, dtlm, tfp); =20 if (MEM_readLE32(dict) !=3D ZSTD_MAGIC_DICTIONARY) { if (dictContentType =3D=3D ZSTD_dct_auto) { DEBUGLOG(4, "raw content dictionary detected"); return ZSTD_loadDictionaryContent( - ms, ls, ws, params, dict, dictSize, dtlm); + ms, ls, ws, params, dict, dictSize, dtlm, tfp); } RETURN_ERROR_IF(dictContentType =3D=3D ZSTD_dct_fullDict, dictiona= ry_wrong, ""); assert(0); /* impossible */ @@ -4387,13 +5140,14 @@ ZSTD_compress_insertDictionary(ZSTD_compressedBlock= State_t* bs, =20 /* dict as full zstd dictionary */ return ZSTD_loadZstdDictionary( - bs, ms, ws, params, dict, dictSize, dtlm, workspace); + bs, ms, ws, params, dict, dictSize, dtlm, tfp, workspace); } =20 #define ZSTD_USE_CDICT_PARAMS_SRCSIZE_CUTOFF (128 KB) #define ZSTD_USE_CDICT_PARAMS_DICTSIZE_MULTIPLIER (6ULL) =20 /*! ZSTD_compressBegin_internal() : + * Assumption : either @dict OR @cdict (or none) is non-NULL, never both * @return : 0, or an error code */ static size_t ZSTD_compressBegin_internal(ZSTD_CCtx* cctx, const void* dict, size_t dictSize, @@ -4426,11 +5180,11 @@ static size_t ZSTD_compressBegin_internal(ZSTD_CCtx= * cctx, cctx->blockState.prevCBlock, &cctx->blockState.mat= chState, &cctx->ldmState, &cctx->workspace, &cctx->appliedP= arams, cdict->dictContent, cdict->dictContentSize, cdict->dictContentType, dt= lm, - cctx->entropyWorkspace) + ZSTD_tfp_forCCtx, cctx->tmpWorkspace) : ZSTD_compress_insertDictionary( cctx->blockState.prevCBlock, &cctx->blockState.mat= chState, &cctx->ldmState, &cctx->workspace, &cctx->appliedP= arams, dict, dictSize, - dictContentType, dtlm, cctx->entropyWorkspace); + dictContentType, dtlm, ZSTD_tfp_forCCtx, cctx->tmp= Workspace); FORWARD_IF_ERROR(dictID, "ZSTD_compress_insertDictionary failed"); assert(dictID <=3D UINT_MAX); cctx->dictID =3D (U32)dictID; @@ -4471,11 +5225,11 @@ size_t ZSTD_compressBegin_advanced(ZSTD_CCtx* cctx, &cctxParams, pledgedSrcSize); } =20 -size_t ZSTD_compressBegin_usingDict(ZSTD_CCtx* cctx, const void* dict, siz= e_t dictSize, int compressionLevel) +static size_t +ZSTD_compressBegin_usingDict_deprecated(ZSTD_CCtx* cctx, const void* dict,= size_t dictSize, int compressionLevel) { ZSTD_CCtx_params cctxParams; - { - ZSTD_parameters const params =3D ZSTD_getParams_internal(compressi= onLevel, ZSTD_CONTENTSIZE_UNKNOWN, dictSize, ZSTD_cpm_noAttachDict); + { ZSTD_parameters const params =3D ZSTD_getParams_internal(compressi= onLevel, ZSTD_CONTENTSIZE_UNKNOWN, dictSize, ZSTD_cpm_noAttachDict); ZSTD_CCtxParams_init_internal(&cctxParams, ¶ms, (compressionLe= vel =3D=3D 0) ? ZSTD_CLEVEL_DEFAULT : compressionLevel); } DEBUGLOG(4, "ZSTD_compressBegin_usingDict (dictSize=3D%u)", (unsigned)= dictSize); @@ -4483,9 +5237,15 @@ size_t ZSTD_compressBegin_usingDict(ZSTD_CCtx* cctx,= const void* dict, size_t di &cctxParams, ZSTD_CONTENTSIZE_UNKNO= WN, ZSTDb_not_buffered); } =20 +size_t +ZSTD_compressBegin_usingDict(ZSTD_CCtx* cctx, const void* dict, size_t dic= tSize, int compressionLevel) +{ + return ZSTD_compressBegin_usingDict_deprecated(cctx, dict, dictSize, c= ompressionLevel); +} + size_t ZSTD_compressBegin(ZSTD_CCtx* cctx, int compressionLevel) { - return ZSTD_compressBegin_usingDict(cctx, NULL, 0, compressionLevel); + return ZSTD_compressBegin_usingDict_deprecated(cctx, NULL, 0, compress= ionLevel); } =20 =20 @@ -4496,14 +5256,13 @@ static size_t ZSTD_writeEpilogue(ZSTD_CCtx* cctx, v= oid* dst, size_t dstCapacity) { BYTE* const ostart =3D (BYTE*)dst; BYTE* op =3D ostart; - size_t fhSize =3D 0; =20 DEBUGLOG(4, "ZSTD_writeEpilogue"); RETURN_ERROR_IF(cctx->stage =3D=3D ZSTDcs_created, stage_wrong, "init = missing"); =20 /* special case : empty frame */ if (cctx->stage =3D=3D ZSTDcs_init) { - fhSize =3D ZSTD_writeFrameHeader(dst, dstCapacity, &cctx->appliedP= arams, 0, 0); + size_t fhSize =3D ZSTD_writeFrameHeader(dst, dstCapacity, &cctx->a= ppliedParams, 0, 0); FORWARD_IF_ERROR(fhSize, "ZSTD_writeFrameHeader failed"); dstCapacity -=3D fhSize; op +=3D fhSize; @@ -4513,8 +5272,9 @@ static size_t ZSTD_writeEpilogue(ZSTD_CCtx* cctx, voi= d* dst, size_t dstCapacity) if (cctx->stage !=3D ZSTDcs_ending) { /* write one last empty block, make it the "last" block */ U32 const cBlockHeader24 =3D 1 /* last block */ + (((U32)bt_raw)<<= 1) + 0; - RETURN_ERROR_IF(dstCapacity<4, dstSize_tooSmall, "no room for epil= ogue"); - MEM_writeLE32(op, cBlockHeader24); + ZSTD_STATIC_ASSERT(ZSTD_BLOCKHEADERSIZE =3D=3D 3); + RETURN_ERROR_IF(dstCapacity<3, dstSize_tooSmall, "no room for epil= ogue"); + MEM_writeLE24(op, cBlockHeader24); op +=3D ZSTD_blockHeaderSize; dstCapacity -=3D ZSTD_blockHeaderSize; } @@ -4528,7 +5288,7 @@ static size_t ZSTD_writeEpilogue(ZSTD_CCtx* cctx, voi= d* dst, size_t dstCapacity) } =20 cctx->stage =3D ZSTDcs_created; /* return to "created but no init" st= atus */ - return op-ostart; + return (size_t)(op-ostart); } =20 void ZSTD_CCtx_trace(ZSTD_CCtx* cctx, size_t extraCSize) @@ -4537,9 +5297,9 @@ void ZSTD_CCtx_trace(ZSTD_CCtx* cctx, size_t extraCSi= ze) (void)extraCSize; } =20 -size_t ZSTD_compressEnd (ZSTD_CCtx* cctx, - void* dst, size_t dstCapacity, - const void* src, size_t srcSize) +size_t ZSTD_compressEnd_public(ZSTD_CCtx* cctx, + void* dst, size_t dstCapacity, + const void* src, size_t srcSize) { size_t endResult; size_t const cSize =3D ZSTD_compressContinue_internal(cctx, @@ -4563,6 +5323,14 @@ size_t ZSTD_compressEnd (ZSTD_CCtx* cctx, return cSize + endResult; } =20 +/* NOTE: Must just wrap ZSTD_compressEnd_public() */ +size_t ZSTD_compressEnd(ZSTD_CCtx* cctx, + void* dst, size_t dstCapacity, + const void* src, size_t srcSize) +{ + return ZSTD_compressEnd_public(cctx, dst, dstCapacity, src, srcSize); +} + size_t ZSTD_compress_advanced (ZSTD_CCtx* cctx, void* dst, size_t dstCapacity, const void* src, size_t srcSize, @@ -4591,7 +5359,7 @@ size_t ZSTD_compress_advanced_internal( FORWARD_IF_ERROR( ZSTD_compressBegin_internal(cctx, dict, dictSize, ZSTD_dct_auto, ZSTD_dtlm_fast, NU= LL, params, srcSize, ZSTDb_not_buffered) , ""); - return ZSTD_compressEnd(cctx, dst, dstCapacity, src, srcSize); + return ZSTD_compressEnd_public(cctx, dst, dstCapacity, src, srcSize); } =20 size_t ZSTD_compress_usingDict(ZSTD_CCtx* cctx, @@ -4709,7 +5477,7 @@ static size_t ZSTD_initCDict_internal( { size_t const dictID =3D ZSTD_compress_insertDictionary( &cdict->cBlockState, &cdict->matchState, NULL, &cdict-= >workspace, ¶ms, cdict->dictContent, cdict->dictContentSize, - dictContentType, ZSTD_dtlm_full, cdict->entropyWorkspa= ce); + dictContentType, ZSTD_dtlm_full, ZSTD_tfp_forCDict, cd= ict->entropyWorkspace); FORWARD_IF_ERROR(dictID, "ZSTD_compress_insertDictionary faile= d"); assert(dictID <=3D (size_t)(U32)-1); cdict->dictID =3D (U32)dictID; @@ -4719,14 +5487,16 @@ static size_t ZSTD_initCDict_internal( return 0; } =20 -static ZSTD_CDict* ZSTD_createCDict_advanced_internal(size_t dictSize, - ZSTD_dictLoadMethod_e dictLoadMethod, - ZSTD_compressionParameters cParams, - ZSTD_paramSwitch_e useRowMatchFinder, - U32 enableDedicatedDictSearch, - ZSTD_customMem customMem) +static ZSTD_CDict* +ZSTD_createCDict_advanced_internal(size_t dictSize, + ZSTD_dictLoadMethod_e dictLoadMethod, + ZSTD_compressionParameters cParams, + ZSTD_ParamSwitch_e useRowMatchFinder, + int enableDedicatedDictSearch, + ZSTD_customMem customMem) { if ((!customMem.customAlloc) ^ (!customMem.customFree)) return NULL; + DEBUGLOG(3, "ZSTD_createCDict_advanced_internal (dictSize=3D%u)", (uns= igned)dictSize); =20 { size_t const workspaceSize =3D ZSTD_cwksp_alloc_size(sizeof(ZSTD_CDict)) + @@ -4763,6 +5533,7 @@ ZSTD_CDict* ZSTD_createCDict_advanced(const void* dic= tBuffer, size_t dictSize, { ZSTD_CCtx_params cctxParams; ZSTD_memset(&cctxParams, 0, sizeof(cctxParams)); + DEBUGLOG(3, "ZSTD_createCDict_advanced, dictSize=3D%u, mode=3D%u", (un= signed)dictSize, (unsigned)dictContentType); ZSTD_CCtxParams_init(&cctxParams, 0); cctxParams.cParams =3D cParams; cctxParams.customMem =3D customMem; @@ -4783,7 +5554,7 @@ ZSTD_CDict* ZSTD_createCDict_advanced2( ZSTD_compressionParameters cParams; ZSTD_CDict* cdict; =20 - DEBUGLOG(3, "ZSTD_createCDict_advanced2, mode %u", (unsigned)dictConte= ntType); + DEBUGLOG(3, "ZSTD_createCDict_advanced2, dictSize=3D%u, mode=3D%u", (u= nsigned)dictSize, (unsigned)dictContentType); if (!customMem.customAlloc ^ !customMem.customFree) return NULL; =20 if (cctxParams.enableDedicatedDictSearch) { @@ -4802,7 +5573,7 @@ ZSTD_CDict* ZSTD_createCDict_advanced2( &cctxParams, ZSTD_CONTENTSIZE_UNKNOWN, dictSize, ZSTD_cpm_crea= teCDict); } =20 - DEBUGLOG(3, "ZSTD_createCDict_advanced2: DDS: %u", cctxParams.enableDe= dicatedDictSearch); + DEBUGLOG(3, "ZSTD_createCDict_advanced2: DedicatedDictSearch=3D%u", cc= txParams.enableDedicatedDictSearch); cctxParams.cParams =3D cParams; cctxParams.useRowMatchFinder =3D ZSTD_resolveRowMatchFinderMode(cctxPa= rams.useRowMatchFinder, &cParams); =20 @@ -4810,10 +5581,8 @@ ZSTD_CDict* ZSTD_createCDict_advanced2( dictLoadMethod, cctxParams.cParams, cctxParams.useRowMatchFinder, cctxParams.enableDed= icatedDictSearch, customMem); - if (!cdict) - return NULL; =20 - if (ZSTD_isError( ZSTD_initCDict_internal(cdict, + if (!cdict || ZSTD_isError( ZSTD_initCDict_internal(cdict, dict, dictSize, dictLoadMethod, dictContentType, cctxParams) )) { @@ -4867,7 +5636,7 @@ size_t ZSTD_freeCDict(ZSTD_CDict* cdict) * workspaceSize: Use ZSTD_estimateCDictSize() * to determine how large workspace must be. * cParams : use ZSTD_getCParams() to transform a compression level - * into its relevants cParams. + * into its relevant cParams. * @return : pointer to ZSTD_CDict*, or NULL if error (size too small) * Note : there is no corresponding "free" function. * Since workspace was allocated externally, it must be freed exte= rnally. @@ -4879,7 +5648,7 @@ const ZSTD_CDict* ZSTD_initStaticCDict( ZSTD_dictContentType_e dictContentType, ZSTD_compressionParameters cParams) { - ZSTD_paramSwitch_e const useRowMatchFinder =3D ZSTD_resolveRowMatchFin= derMode(ZSTD_ps_auto, &cParams); + ZSTD_ParamSwitch_e const useRowMatchFinder =3D ZSTD_resolveRowMatchFin= derMode(ZSTD_ps_auto, &cParams); /* enableDedicatedDictSearch =3D=3D 1 ensures matchstate is not too sm= all in case this CDict will be used for DDS + row hash */ size_t const matchStateSize =3D ZSTD_sizeof_matchState(&cParams, useRo= wMatchFinder, /* enableDedicatedDictSearch */ 1, /* forCCtx */ 0); size_t const neededSize =3D ZSTD_cwksp_alloc_size(sizeof(ZSTD_CDict)) @@ -4890,6 +5659,7 @@ const ZSTD_CDict* ZSTD_initStaticCDict( ZSTD_CDict* cdict; ZSTD_CCtx_params params; =20 + DEBUGLOG(4, "ZSTD_initStaticCDict (dictSize=3D=3D%u)", (unsigned)dictS= ize); if ((size_t)workspace & 7) return NULL; /* 8-aligned */ =20 { @@ -4900,14 +5670,13 @@ const ZSTD_CDict* ZSTD_initStaticCDict( ZSTD_cwksp_move(&cdict->workspace, &ws); } =20 - DEBUGLOG(4, "(workspaceSize < neededSize) : (%u < %u) =3D> %u", - (unsigned)workspaceSize, (unsigned)neededSize, (unsigned)(workspac= eSize < neededSize)); if (workspaceSize < neededSize) return NULL; =20 ZSTD_CCtxParams_init(¶ms, 0); params.cParams =3D cParams; params.useRowMatchFinder =3D useRowMatchFinder; cdict->useRowMatchFinder =3D useRowMatchFinder; + cdict->compressionLevel =3D ZSTD_NO_CLEVEL; =20 if (ZSTD_isError( ZSTD_initCDict_internal(cdict, dict, dictSize, @@ -4987,12 +5756,17 @@ size_t ZSTD_compressBegin_usingCDict_advanced( =20 /* ZSTD_compressBegin_usingCDict() : * cdict must be !=3D NULL */ -size_t ZSTD_compressBegin_usingCDict(ZSTD_CCtx* cctx, const ZSTD_CDict* cd= ict) +size_t ZSTD_compressBegin_usingCDict_deprecated(ZSTD_CCtx* cctx, const ZST= D_CDict* cdict) { ZSTD_frameParameters const fParams =3D { 0 /*content*/, 0 /*checksum*/= , 0 /*noDictID*/ }; return ZSTD_compressBegin_usingCDict_internal(cctx, cdict, fParams, ZS= TD_CONTENTSIZE_UNKNOWN); } =20 +size_t ZSTD_compressBegin_usingCDict(ZSTD_CCtx* cctx, const ZSTD_CDict* cd= ict) +{ + return ZSTD_compressBegin_usingCDict_deprecated(cctx, cdict); +} + /*! ZSTD_compress_usingCDict_internal(): * Implementation of various ZSTD_compress_usingCDict* functions. */ @@ -5002,7 +5776,7 @@ static size_t ZSTD_compress_usingCDict_internal(ZSTD_= CCtx* cctx, const ZSTD_CDict* cdict, ZSTD_frameParamet= ers fParams) { FORWARD_IF_ERROR(ZSTD_compressBegin_usingCDict_internal(cctx, cdict, f= Params, srcSize), ""); /* will check if cdict !=3D NULL */ - return ZSTD_compressEnd(cctx, dst, dstCapacity, src, srcSize); + return ZSTD_compressEnd_public(cctx, dst, dstCapacity, src, srcSize); } =20 /*! ZSTD_compress_usingCDict_advanced(): @@ -5068,7 +5842,7 @@ size_t ZSTD_CStreamOutSize(void) return ZSTD_compressBound(ZSTD_BLOCKSIZE_MAX) + ZSTD_blockHeaderSize += 4 /* 32-bits hash */ ; } =20 -static ZSTD_cParamMode_e ZSTD_getCParamMode(ZSTD_CDict const* cdict, ZSTD_= CCtx_params const* params, U64 pledgedSrcSize) +static ZSTD_CParamMode_e ZSTD_getCParamMode(ZSTD_CDict const* cdict, ZSTD_= CCtx_params const* params, U64 pledgedSrcSize) { if (cdict !=3D NULL && ZSTD_shouldAttachDict(cdict, params, pledgedSrc= Size)) return ZSTD_cpm_attachDict; @@ -5199,30 +5973,41 @@ size_t ZSTD_initCStream(ZSTD_CStream* zcs, int comp= ressionLevel) =20 static size_t ZSTD_nextInputSizeHint(const ZSTD_CCtx* cctx) { - size_t hintInSize =3D cctx->inBuffTarget - cctx->inBuffPos; - if (hintInSize=3D=3D0) hintInSize =3D cctx->blockSize; - return hintInSize; + if (cctx->appliedParams.inBufferMode =3D=3D ZSTD_bm_stable) { + return cctx->blockSizeMax - cctx->stableIn_notConsumed; + } + assert(cctx->appliedParams.inBufferMode =3D=3D ZSTD_bm_buffered); + { size_t hintInSize =3D cctx->inBuffTarget - cctx->inBuffPos; + if (hintInSize=3D=3D0) hintInSize =3D cctx->blockSizeMax; + return hintInSize; + } } =20 /* ZSTD_compressStream_generic(): * internal function for all *compressStream*() variants - * non-static, because can be called from zstdmt_compress.c - * @return : hint size for next input */ + * @return : hint size for next input to complete ongoing block */ static size_t ZSTD_compressStream_generic(ZSTD_CStream* zcs, ZSTD_outBuffer* output, ZSTD_inBuffer* input, ZSTD_EndDirective const flushMod= e) { - const char* const istart =3D (const char*)input->src; - const char* const iend =3D input->size !=3D 0 ? istart + input->size := istart; - const char* ip =3D input->pos !=3D 0 ? istart + input->pos : istart; - char* const ostart =3D (char*)output->dst; - char* const oend =3D output->size !=3D 0 ? ostart + output->size : ost= art; - char* op =3D output->pos !=3D 0 ? ostart + output->pos : ostart; + const char* const istart =3D (assert(input !=3D NULL), (const char*)in= put->src); + const char* const iend =3D (istart !=3D NULL) ? istart + input->size := istart; + const char* ip =3D (istart !=3D NULL) ? istart + input->pos : istart; + char* const ostart =3D (assert(output !=3D NULL), (char*)output->dst); + char* const oend =3D (ostart !=3D NULL) ? ostart + output->size : osta= rt; + char* op =3D (ostart !=3D NULL) ? ostart + output->pos : ostart; U32 someMoreWork =3D 1; =20 /* check expectations */ - DEBUGLOG(5, "ZSTD_compressStream_generic, flush=3D%u", (unsigned)flush= Mode); + DEBUGLOG(5, "ZSTD_compressStream_generic, flush=3D%i, srcSize =3D %zu"= , (int)flushMode, input->size - input->pos); + assert(zcs !=3D NULL); + if (zcs->appliedParams.inBufferMode =3D=3D ZSTD_bm_stable) { + assert(input->pos >=3D zcs->stableIn_notConsumed); + input->pos -=3D zcs->stableIn_notConsumed; + if (ip) ip -=3D zcs->stableIn_notConsumed; + zcs->stableIn_notConsumed =3D 0; + } if (zcs->appliedParams.inBufferMode =3D=3D ZSTD_bm_buffered) { assert(zcs->inBuff !=3D NULL); assert(zcs->inBuffSize > 0); @@ -5231,8 +6016,10 @@ static size_t ZSTD_compressStream_generic(ZSTD_CStre= am* zcs, assert(zcs->outBuff !=3D NULL); assert(zcs->outBuffSize > 0); } - assert(output->pos <=3D output->size); + if (input->src =3D=3D NULL) assert(input->size =3D=3D 0); assert(input->pos <=3D input->size); + if (output->dst =3D=3D NULL) assert(output->size =3D=3D 0); + assert(output->pos <=3D output->size); assert((U32)flushMode <=3D (U32)ZSTD_e_end); =20 while (someMoreWork) { @@ -5243,12 +6030,13 @@ static size_t ZSTD_compressStream_generic(ZSTD_CStr= eam* zcs, =20 case zcss_load: if ( (flushMode =3D=3D ZSTD_e_end) - && ( (size_t)(oend-op) >=3D ZSTD_compressBound(iend-ip) = /* Enough output space */ + && ( (size_t)(oend-op) >=3D ZSTD_compressBound((size_t)(iend= -ip)) /* Enough output space */ || zcs->appliedParams.outBufferMode =3D=3D ZSTD_bm_stable)= /* OR we are allowed to return dstSizeTooSmall */ && (zcs->inBuffPos =3D=3D 0) ) { /* shortcut to compression pass directly into output buffe= r */ - size_t const cSize =3D ZSTD_compressEnd(zcs, - op, oend-op, ip, iend-ip); + size_t const cSize =3D ZSTD_compressEnd_public(zcs, + op, (size_t)(oend-op), + ip, (size_t)(iend-ip)); DEBUGLOG(4, "ZSTD_compressEnd : cSize=3D%u", (unsigned)cSi= ze); FORWARD_IF_ERROR(cSize, "ZSTD_compressEnd failed"); ip =3D iend; @@ -5262,10 +6050,9 @@ static size_t ZSTD_compressStream_generic(ZSTD_CStre= am* zcs, size_t const toLoad =3D zcs->inBuffTarget - zcs->inBuffPos; size_t const loaded =3D ZSTD_limitCopy( zcs->inBuff + zcs->inBuffPos, toLo= ad, - ip, iend-ip); + ip, (size_t)(iend-ip)); zcs->inBuffPos +=3D loaded; - if (loaded !=3D 0) - ip +=3D loaded; + if (ip) ip +=3D loaded; if ( (flushMode =3D=3D ZSTD_e_continue) && (zcs->inBuffPos < zcs->inBuffTarget) ) { /* not enough input to fill full block : stop here */ @@ -5276,16 +6063,29 @@ static size_t ZSTD_compressStream_generic(ZSTD_CStr= eam* zcs, /* empty */ someMoreWork =3D 0; break; } + } else { + assert(zcs->appliedParams.inBufferMode =3D=3D ZSTD_bm_stab= le); + if ( (flushMode =3D=3D ZSTD_e_continue) + && ( (size_t)(iend - ip) < zcs->blockSizeMax) ) { + /* can't compress a full block : stop here */ + zcs->stableIn_notConsumed =3D (size_t)(iend - ip); + ip =3D iend; /* pretend to have consumed input */ + someMoreWork =3D 0; break; + } + if ( (flushMode =3D=3D ZSTD_e_flush) + && (ip =3D=3D iend) ) { + /* empty */ + someMoreWork =3D 0; break; + } } /* compress current block (note : this stage cannot be stopped= in the middle) */ DEBUGLOG(5, "stream compression stage (flushMode=3D=3D%u)", fl= ushMode); { int const inputBuffered =3D (zcs->appliedParams.inBufferMo= de =3D=3D ZSTD_bm_buffered); void* cDst; size_t cSize; - size_t oSize =3D oend-op; - size_t const iSize =3D inputBuffered - ? zcs->inBuffPos - zcs->inToCompress - : MIN((size_t)(iend - ip), zcs->blockSize); + size_t oSize =3D (size_t)(oend-op); + size_t const iSize =3D inputBuffered ? zcs->inBuffPos - zc= s->inToCompress + : MIN((size_t)(iend - i= p), zcs->blockSizeMax); if (oSize >=3D ZSTD_compressBound(iSize) || zcs->appliedPa= rams.outBufferMode =3D=3D ZSTD_bm_stable) cDst =3D op; /* compress into output buffer, to skip= flush stage */ else @@ -5293,34 +6093,31 @@ static size_t ZSTD_compressStream_generic(ZSTD_CStr= eam* zcs, if (inputBuffered) { unsigned const lastBlock =3D (flushMode =3D=3D ZSTD_e_= end) && (ip=3D=3Diend); cSize =3D lastBlock ? - ZSTD_compressEnd(zcs, cDst, oSize, + ZSTD_compressEnd_public(zcs, cDst, oSize, zcs->inBuff + zcs->inToCompress, i= Size) : - ZSTD_compressContinue(zcs, cDst, oSize, + ZSTD_compressContinue_public(zcs, cDst, oSize, zcs->inBuff + zcs->inToCompress, i= Size); FORWARD_IF_ERROR(cSize, "%s", lastBlock ? "ZSTD_compre= ssEnd failed" : "ZSTD_compressContinue failed"); zcs->frameEnded =3D lastBlock; /* prepare next block */ - zcs->inBuffTarget =3D zcs->inBuffPos + zcs->blockSize; + zcs->inBuffTarget =3D zcs->inBuffPos + zcs->blockSizeM= ax; if (zcs->inBuffTarget > zcs->inBuffSize) - zcs->inBuffPos =3D 0, zcs->inBuffTarget =3D zcs->b= lockSize; + zcs->inBuffPos =3D 0, zcs->inBuffTarget =3D zcs->b= lockSizeMax; DEBUGLOG(5, "inBuffTarget:%u / inBuffSize:%u", (unsigned)zcs->inBuffTarget, (unsigned)zcs->in= BuffSize); if (!lastBlock) assert(zcs->inBuffTarget <=3D zcs->inBuffSize); zcs->inToCompress =3D zcs->inBuffPos; - } else { - unsigned const lastBlock =3D (ip + iSize =3D=3D iend); - assert(flushMode =3D=3D ZSTD_e_end /* Already validate= d */); + } else { /* !inputBuffered, hence ZSTD_bm_stable */ + unsigned const lastBlock =3D (flushMode =3D=3D ZSTD_e_= end) && (ip + iSize =3D=3D iend); cSize =3D lastBlock ? - ZSTD_compressEnd(zcs, cDst, oSize, ip, iSize) : - ZSTD_compressContinue(zcs, cDst, oSize, ip, iS= ize); + ZSTD_compressEnd_public(zcs, cDst, oSize, ip, = iSize) : + ZSTD_compressContinue_public(zcs, cDst, oSize,= ip, iSize); /* Consume the input prior to error checking to mirror= buffered mode. */ - if (iSize > 0) - ip +=3D iSize; + if (ip) ip +=3D iSize; FORWARD_IF_ERROR(cSize, "%s", lastBlock ? "ZSTD_compre= ssEnd failed" : "ZSTD_compressContinue failed"); zcs->frameEnded =3D lastBlock; - if (lastBlock) - assert(ip =3D=3D iend); + if (lastBlock) assert(ip =3D=3D iend); } if (cDst =3D=3D op) { /* no need to flush */ op +=3D cSize; @@ -5369,8 +6166,8 @@ static size_t ZSTD_compressStream_generic(ZSTD_CStrea= m* zcs, } } =20 - input->pos =3D ip - istart; - output->pos =3D op - ostart; + input->pos =3D (size_t)(ip - istart); + output->pos =3D (size_t)(op - ostart); if (zcs->frameEnded) return 0; return ZSTD_nextInputSizeHint(zcs); } @@ -5390,8 +6187,10 @@ size_t ZSTD_compressStream(ZSTD_CStream* zcs, ZSTD_o= utBuffer* output, ZSTD_inBuf /* After a compression call set the expected input/output buffer. * This is validated at the start of the next compression call. */ -static void ZSTD_setBufferExpectations(ZSTD_CCtx* cctx, ZSTD_outBuffer con= st* output, ZSTD_inBuffer const* input) +static void +ZSTD_setBufferExpectations(ZSTD_CCtx* cctx, const ZSTD_outBuffer* output, = const ZSTD_inBuffer* input) { + DEBUGLOG(5, "ZSTD_setBufferExpectations (for advanced stable in/out mo= des)"); if (cctx->appliedParams.inBufferMode =3D=3D ZSTD_bm_stable) { cctx->expectedInBuffer =3D *input; } @@ -5410,22 +6209,27 @@ static size_t ZSTD_checkBufferStability(ZSTD_CCtx c= onst* cctx, { if (cctx->appliedParams.inBufferMode =3D=3D ZSTD_bm_stable) { ZSTD_inBuffer const expect =3D cctx->expectedInBuffer; - if (expect.src !=3D input->src || expect.pos !=3D input->pos || ex= pect.size !=3D input->size) - RETURN_ERROR(srcBuffer_wrong, "ZSTD_c_stableInBuffer enabled b= ut input differs!"); - if (endOp !=3D ZSTD_e_end) - RETURN_ERROR(srcBuffer_wrong, "ZSTD_c_stableInBuffer can only = be used with ZSTD_e_end!"); + if (expect.src !=3D input->src || expect.pos !=3D input->pos) + RETURN_ERROR(stabilityCondition_notRespected, "ZSTD_c_stableIn= Buffer enabled but input differs!"); } + (void)endOp; if (cctx->appliedParams.outBufferMode =3D=3D ZSTD_bm_stable) { size_t const outBufferSize =3D output->size - output->pos; if (cctx->expectedOutBufferSize !=3D outBufferSize) - RETURN_ERROR(dstBuffer_wrong, "ZSTD_c_stableOutBuffer enabled = but output size differs!"); + RETURN_ERROR(stabilityCondition_notRespected, "ZSTD_c_stableOu= tBuffer enabled but output size differs!"); } return 0; } =20 +/* + * If @endOp =3D=3D ZSTD_e_end, @inSize becomes pledgedSrcSize. + * Otherwise, it's ignored. + * @return: 0 on success, or a ZSTD_error code otherwise. + */ static size_t ZSTD_CCtx_init_compressStream2(ZSTD_CCtx* cctx, ZSTD_EndDirective endOp, - size_t inSize) { + size_t inSize) +{ ZSTD_CCtx_params params =3D cctx->requestedParams; ZSTD_prefixDict const prefixDict =3D cctx->prefixDict; FORWARD_IF_ERROR( ZSTD_initLocalDict(cctx) , ""); /* Init the local di= ct if present. */ @@ -5438,21 +6242,24 @@ static size_t ZSTD_CCtx_init_compressStream2(ZSTD_C= Ctx* cctx, */ params.compressionLevel =3D cctx->cdict->compressionLevel; } - DEBUGLOG(4, "ZSTD_compressStream2 : transparent init stage"); - if (endOp =3D=3D ZSTD_e_end) cctx->pledgedSrcSizePlusOne =3D inSize + = 1; /* auto-fix pledgedSrcSize */ - { - size_t const dictSize =3D prefixDict.dict + DEBUGLOG(4, "ZSTD_CCtx_init_compressStream2 : transparent init stage"); + if (endOp =3D=3D ZSTD_e_end) cctx->pledgedSrcSizePlusOne =3D inSize + = 1; /* auto-determine pledgedSrcSize */ + + { size_t const dictSize =3D prefixDict.dict ? prefixDict.dictSize : (cctx->cdict ? cctx->cdict->dictContentSize : 0); - ZSTD_cParamMode_e const mode =3D ZSTD_getCParamMode(cctx->cdict, &= params, cctx->pledgedSrcSizePlusOne - 1); + ZSTD_CParamMode_e const mode =3D ZSTD_getCParamMode(cctx->cdict, &= params, cctx->pledgedSrcSizePlusOne - 1); params.cParams =3D ZSTD_getCParamsFromCCtxParams( ¶ms, cctx->pledgedSrcSizePlusOne-1, dictSize, mode); } =20 - params.useBlockSplitter =3D ZSTD_resolveBlockSplitterMode(params.useBl= ockSplitter, ¶ms.cParams); + params.postBlockSplitter =3D ZSTD_resolveBlockSplitterMode(params.post= BlockSplitter, ¶ms.cParams); params.ldmParams.enableLdm =3D ZSTD_resolveEnableLdm(params.ldmParams.= enableLdm, ¶ms.cParams); params.useRowMatchFinder =3D ZSTD_resolveRowMatchFinderMode(params.use= RowMatchFinder, ¶ms.cParams); + params.validateSequences =3D ZSTD_resolveExternalSequenceValidation(pa= rams.validateSequences); + params.maxBlockSize =3D ZSTD_resolveMaxBlockSize(params.maxBlockSize); + params.searchForExternalRepcodes =3D ZSTD_resolveExternalRepcodeSearch= (params.searchForExternalRepcodes, params.compressionLevel); =20 { U64 const pledgedSrcSize =3D cctx->pledgedSrcSizePlusOne - 1; assert(!ZSTD_isError(ZSTD_checkCParams(params.cParams))); @@ -5468,7 +6275,7 @@ static size_t ZSTD_CCtx_init_compressStream2(ZSTD_CCt= x* cctx, /* for small input: avoid automatic flush on reaching end of b= lock, since * it would require to add a 3-bytes null block to end frame */ - cctx->inBuffTarget =3D cctx->blockSize + (cctx->blockSize =3D= =3D pledgedSrcSize); + cctx->inBuffTarget =3D cctx->blockSizeMax + (cctx->blockSizeMa= x =3D=3D pledgedSrcSize); } else { cctx->inBuffTarget =3D 0; } @@ -5479,6 +6286,8 @@ static size_t ZSTD_CCtx_init_compressStream2(ZSTD_CCt= x* cctx, return 0; } =20 +/* @return provides a minimum amount of data remaining to be flushed from = internal buffers + */ size_t ZSTD_compressStream2( ZSTD_CCtx* cctx, ZSTD_outBuffer* output, ZSTD_inBuffer* input, @@ -5493,8 +6302,27 @@ size_t ZSTD_compressStream2( ZSTD_CCtx* cctx, =20 /* transparent initialization stage */ if (cctx->streamStage =3D=3D zcss_init) { - FORWARD_IF_ERROR(ZSTD_CCtx_init_compressStream2(cctx, endOp, input= ->size), "CompressStream2 initialization failed"); - ZSTD_setBufferExpectations(cctx, output, input); /* Set initial= buffer expectations now that we've initialized */ + size_t const inputSize =3D input->size - input->pos; /* no obliga= tion to start from pos=3D=3D0 */ + size_t const totalInputSize =3D inputSize + cctx->stableIn_notCons= umed; + if ( (cctx->requestedParams.inBufferMode =3D=3D ZSTD_bm_stable) /*= input is presumed stable, across invocations */ + && (endOp =3D=3D ZSTD_e_continue) /*= no flush requested, more input to come */ + && (totalInputSize < ZSTD_BLOCKSIZE_MAX) ) { /* not= even reached one block yet */ + if (cctx->stableIn_notConsumed) { /* not the first time */ + /* check stable source guarantees */ + RETURN_ERROR_IF(input->src !=3D cctx->expectedInBuffer.src= , stabilityCondition_notRespected, "stableInBuffer condition not respected:= wrong src pointer"); + RETURN_ERROR_IF(input->pos !=3D cctx->expectedInBuffer.siz= e, stabilityCondition_notRespected, "stableInBuffer condition not respected= : externally modified pos"); + } + /* pretend input was consumed, to give a sense forward progres= s */ + input->pos =3D input->size; + /* save stable inBuffer, for later control, and flush/end */ + cctx->expectedInBuffer =3D *input; + /* but actually input wasn't consumed, so keep track of positi= on from where compression shall resume */ + cctx->stableIn_notConsumed +=3D inputSize; + /* don't initialize yet, wait for the first block of flush() o= rder, for better parameters adaptation */ + return ZSTD_FRAMEHEADERSIZE_MIN(cctx->requestedParams.format);= /* at least some header to produce */ + } + FORWARD_IF_ERROR(ZSTD_CCtx_init_compressStream2(cctx, endOp, total= InputSize), "compressStream2 initialization failed"); + ZSTD_setBufferExpectations(cctx, output, input); /* Set initial = buffer expectations now that we've initialized */ } /* end of transparent initialization stage */ =20 @@ -5512,13 +6340,20 @@ size_t ZSTD_compressStream2_simpleArgs ( const void* src, size_t srcSize, size_t* srcPos, ZSTD_EndDirective endOp) { - ZSTD_outBuffer output =3D { dst, dstCapacity, *dstPos }; - ZSTD_inBuffer input =3D { src, srcSize, *srcPos }; + ZSTD_outBuffer output; + ZSTD_inBuffer input; + output.dst =3D dst; + output.size =3D dstCapacity; + output.pos =3D *dstPos; + input.src =3D src; + input.size =3D srcSize; + input.pos =3D *srcPos; /* ZSTD_compressStream2() will check validity of dstPos and srcPos */ - size_t const cErr =3D ZSTD_compressStream2(cctx, &output, &input, endO= p); - *dstPos =3D output.pos; - *srcPos =3D input.pos; - return cErr; + { size_t const cErr =3D ZSTD_compressStream2(cctx, &output, &input, = endOp); + *dstPos =3D output.pos; + *srcPos =3D input.pos; + return cErr; + } } =20 size_t ZSTD_compress2(ZSTD_CCtx* cctx, @@ -5541,6 +6376,7 @@ size_t ZSTD_compress2(ZSTD_CCtx* cctx, /* Reset to the original values. */ cctx->requestedParams.inBufferMode =3D originalInBufferMode; cctx->requestedParams.outBufferMode =3D originalOutBufferMode; + FORWARD_IF_ERROR(result, "ZSTD_compressStream2_simpleArgs failed"); if (result !=3D 0) { /* compression not completed, due to lack of= output space */ assert(oPos =3D=3D dstCapacity); @@ -5551,64 +6387,67 @@ size_t ZSTD_compress2(ZSTD_CCtx* cctx, } } =20 -typedef struct { - U32 idx; /* Index in array of ZSTD_Sequence */ - U32 posInSequence; /* Position within sequence at idx */ - size_t posInSrc; /* Number of bytes given by sequences provided= so far */ -} ZSTD_sequencePosition; - /* ZSTD_validateSequence() : - * @offCode : is presumed to follow format required by ZSTD_storeSeq() + * @offBase : must use the format required by ZSTD_storeSeq() * @returns a ZSTD error code if sequence is not valid */ static size_t -ZSTD_validateSequence(U32 offCode, U32 matchLength, - size_t posInSrc, U32 windowLog, size_t dictSize) +ZSTD_validateSequence(U32 offBase, U32 matchLength, U32 minMatch, + size_t posInSrc, U32 windowLog, size_t dictSize, int= useSequenceProducer) { - U32 const windowSize =3D 1 << windowLog; + U32 const windowSize =3D 1u << windowLog; /* posInSrc represents the amount of data the decoder would decode up = to this point. * As long as the amount of data decoded is less than or equal to wind= ow size, offsets may be * larger than the total length of output decoded in order to referenc= e the dict, even larger than * window size. After output surpasses windowSize, we're limited to wi= ndowSize offsets again. */ size_t const offsetBound =3D posInSrc > windowSize ? (size_t)windowSiz= e : posInSrc + (size_t)dictSize; - RETURN_ERROR_IF(offCode > STORE_OFFSET(offsetBound), corruption_detect= ed, "Offset too large!"); - RETURN_ERROR_IF(matchLength < MINMATCH, corruption_detected, "Matchlen= gth too small"); + size_t const matchLenLowerBound =3D (minMatch =3D=3D 3 || useSequenceP= roducer) ? 3 : 4; + RETURN_ERROR_IF(offBase > OFFSET_TO_OFFBASE(offsetBound), externalSequ= ences_invalid, "Offset too large!"); + /* Validate maxNbSeq is large enough for the given matchLength and min= Match */ + RETURN_ERROR_IF(matchLength < matchLenLowerBound, externalSequences_in= valid, "Matchlength too small for the minMatch"); return 0; } =20 /* Returns an offset code, given a sequence's raw offset, the ongoing repc= ode array, and whether litLength =3D=3D 0 */ -static U32 ZSTD_finalizeOffCode(U32 rawOffset, const U32 rep[ZSTD_REP_NUM]= , U32 ll0) +static U32 ZSTD_finalizeOffBase(U32 rawOffset, const U32 rep[ZSTD_REP_NUM]= , U32 ll0) { - U32 offCode =3D STORE_OFFSET(rawOffset); + U32 offBase =3D OFFSET_TO_OFFBASE(rawOffset); =20 if (!ll0 && rawOffset =3D=3D rep[0]) { - offCode =3D STORE_REPCODE_1; + offBase =3D REPCODE1_TO_OFFBASE; } else if (rawOffset =3D=3D rep[1]) { - offCode =3D STORE_REPCODE(2 - ll0); + offBase =3D REPCODE_TO_OFFBASE(2 - ll0); } else if (rawOffset =3D=3D rep[2]) { - offCode =3D STORE_REPCODE(3 - ll0); + offBase =3D REPCODE_TO_OFFBASE(3 - ll0); } else if (ll0 && rawOffset =3D=3D rep[0] - 1) { - offCode =3D STORE_REPCODE_3; + offBase =3D REPCODE3_TO_OFFBASE; } - return offCode; + return offBase; } =20 -/* Returns 0 on success, and a ZSTD_error otherwise. This function scans t= hrough an array of - * ZSTD_Sequence, storing the sequences it finds, until it reaches a block= delimiter. +/* This function scans through an array of ZSTD_Sequence, + * storing the sequences it reads, until it reaches a block delimiter. + * Note that the block delimiter includes the last literals of the block. + * @blockSize must be =3D=3D sum(sequence_lengths). + * @returns @blockSize on success, and a ZSTD_error otherwise. */ static size_t -ZSTD_copySequencesToSeqStoreExplicitBlockDelim(ZSTD_CCtx* cctx, - ZSTD_sequencePosition* seqPo= s, - const ZSTD_Sequence* const inSeqs,= size_t inSeqsSize, - const void* src, size_t blockSize) +ZSTD_transferSequences_wBlockDelim(ZSTD_CCtx* cctx, + ZSTD_SequencePosition* seqPos, + const ZSTD_Sequence* const inSeqs, size_t inS= eqsSize, + const void* src, size_t blockSize, + ZSTD_ParamSwitch_e externalRepSearch) { U32 idx =3D seqPos->idx; + U32 const startIdx =3D idx; BYTE const* ip =3D (BYTE const*)(src); const BYTE* const iend =3D ip + blockSize; - repcodes_t updatedRepcodes; + Repcodes_t updatedRepcodes; U32 dictSize; =20 + DEBUGLOG(5, "ZSTD_transferSequences_wBlockDelim (blockSize =3D %zu)", = blockSize); + if (cctx->cdict) { dictSize =3D (U32)cctx->cdict->dictContentSize; } else if (cctx->prefixDict.dict) { @@ -5616,27 +6455,60 @@ ZSTD_copySequencesToSeqStoreExplicitBlockDelim(ZSTD= _CCtx* cctx, } else { dictSize =3D 0; } - ZSTD_memcpy(updatedRepcodes.rep, cctx->blockState.prevCBlock->rep, siz= eof(repcodes_t)); - for (; (inSeqs[idx].matchLength !=3D 0 || inSeqs[idx].offset !=3D 0) &= & idx < inSeqsSize; ++idx) { + ZSTD_memcpy(updatedRepcodes.rep, cctx->blockState.prevCBlock->rep, siz= eof(Repcodes_t)); + for (; idx < inSeqsSize && (inSeqs[idx].matchLength !=3D 0 || inSeqs[i= dx].offset !=3D 0); ++idx) { U32 const litLength =3D inSeqs[idx].litLength; - U32 const ll0 =3D (litLength =3D=3D 0); U32 const matchLength =3D inSeqs[idx].matchLength; - U32 const offCode =3D ZSTD_finalizeOffCode(inSeqs[idx].offset, upd= atedRepcodes.rep, ll0); - ZSTD_updateRep(updatedRepcodes.rep, offCode, ll0); + U32 offBase; + + if (externalRepSearch =3D=3D ZSTD_ps_disable) { + offBase =3D OFFSET_TO_OFFBASE(inSeqs[idx].offset); + } else { + U32 const ll0 =3D (litLength =3D=3D 0); + offBase =3D ZSTD_finalizeOffBase(inSeqs[idx].offset, updatedRe= pcodes.rep, ll0); + ZSTD_updateRep(updatedRepcodes.rep, offBase, ll0); + } =20 - DEBUGLOG(6, "Storing sequence: (of: %u, ml: %u, ll: %u)", offCode,= matchLength, litLength); + DEBUGLOG(6, "Storing sequence: (of: %u, ml: %u, ll: %u)", offBase,= matchLength, litLength); if (cctx->appliedParams.validateSequences) { seqPos->posInSrc +=3D litLength + matchLength; - FORWARD_IF_ERROR(ZSTD_validateSequence(offCode, matchLength, s= eqPos->posInSrc, - cctx->appliedParams.cParam= s.windowLog, dictSize), + FORWARD_IF_ERROR(ZSTD_validateSequence(offBase, matchLength, c= ctx->appliedParams.cParams.minMatch, + seqPos->posInSrc, + cctx->appliedParams.cParam= s.windowLog, dictSize, + ZSTD_hasExtSeqProd(&cctx->= appliedParams)), "Sequence validation faile= d"); } - RETURN_ERROR_IF(idx - seqPos->idx > cctx->seqStore.maxNbSeq, memor= y_allocation, + RETURN_ERROR_IF(idx - seqPos->idx >=3D cctx->seqStore.maxNbSeq, ex= ternalSequences_invalid, "Not enough memory allocated. Try adjusting ZSTD_c= _minMatch."); - ZSTD_storeSeq(&cctx->seqStore, litLength, ip, iend, offCode, match= Length); + ZSTD_storeSeq(&cctx->seqStore, litLength, ip, iend, offBase, match= Length); ip +=3D matchLength + litLength; } - ZSTD_memcpy(cctx->blockState.nextCBlock->rep, updatedRepcodes.rep, siz= eof(repcodes_t)); + RETURN_ERROR_IF(idx =3D=3D inSeqsSize, externalSequences_invalid, "Blo= ck delimiter not found."); + + /* If we skipped repcode search while parsing, we need to update repco= des now */ + assert(externalRepSearch !=3D ZSTD_ps_auto); + assert(idx >=3D startIdx); + if (externalRepSearch =3D=3D ZSTD_ps_disable && idx !=3D startIdx) { + U32* const rep =3D updatedRepcodes.rep; + U32 lastSeqIdx =3D idx - 1; /* index of last non-block-delimiter s= equence */ + + if (lastSeqIdx >=3D startIdx + 2) { + rep[2] =3D inSeqs[lastSeqIdx - 2].offset; + rep[1] =3D inSeqs[lastSeqIdx - 1].offset; + rep[0] =3D inSeqs[lastSeqIdx].offset; + } else if (lastSeqIdx =3D=3D startIdx + 1) { + rep[2] =3D rep[0]; + rep[1] =3D inSeqs[lastSeqIdx - 1].offset; + rep[0] =3D inSeqs[lastSeqIdx].offset; + } else { + assert(lastSeqIdx =3D=3D startIdx); + rep[2] =3D rep[1]; + rep[1] =3D rep[0]; + rep[0] =3D inSeqs[lastSeqIdx].offset; + } + } + + ZSTD_memcpy(cctx->blockState.nextCBlock->rep, updatedRepcodes.rep, siz= eof(Repcodes_t)); =20 if (inSeqs[idx].litLength) { DEBUGLOG(6, "Storing last literals of size: %u", inSeqs[idx].litLe= ngth); @@ -5644,37 +6516,43 @@ ZSTD_copySequencesToSeqStoreExplicitBlockDelim(ZSTD= _CCtx* cctx, ip +=3D inSeqs[idx].litLength; seqPos->posInSrc +=3D inSeqs[idx].litLength; } - RETURN_ERROR_IF(ip !=3D iend, corruption_detected, "Blocksize doesn't = agree with block delimiter!"); + RETURN_ERROR_IF(ip !=3D iend, externalSequences_invalid, "Blocksize do= esn't agree with block delimiter!"); seqPos->idx =3D idx+1; - return 0; + return blockSize; } =20 -/* Returns the number of bytes to move the current read position back by. = Only non-zero - * if we ended up splitting a sequence. Otherwise, it may return a ZSTD er= ror if something - * went wrong. +/* + * This function attempts to scan through @blockSize bytes in @src + * represented by the sequences in @inSeqs, + * storing any (partial) sequences. * - * This function will attempt to scan through blockSize bytes represented = by the sequences - * in inSeqs, storing any (partial) sequences. + * Occasionally, we may want to reduce the actual number of bytes consumed= from @src + * to avoid splitting a match, notably if it would produce a match smaller= than MINMATCH. * - * Occasionally, we may want to change the actual number of bytes we consu= med from inSeqs to - * avoid splitting a match, or to avoid splitting a match such that it wou= ld produce a match - * smaller than MINMATCH. In this case, we return the number of bytes that= we didn't read from this block. + * @returns the number of bytes consumed from @src, necessarily <=3D @bloc= kSize. + * Otherwise, it may return a ZSTD error if something went wrong. */ static size_t -ZSTD_copySequencesToSeqStoreNoBlockDelim(ZSTD_CCtx* cctx, ZSTD_sequencePos= ition* seqPos, - const ZSTD_Sequence* const inSeqs, size= _t inSeqsSize, - const void* src, size_t blockSize) +ZSTD_transferSequences_noDelim(ZSTD_CCtx* cctx, + ZSTD_SequencePosition* seqPos, + const ZSTD_Sequence* const inSeqs, size_t inSeqsS= ize, + const void* src, size_t blockSize, + ZSTD_ParamSwitch_e externalRepSearch) { U32 idx =3D seqPos->idx; U32 startPosInSequence =3D seqPos->posInSequence; U32 endPosInSequence =3D seqPos->posInSequence + (U32)blockSize; size_t dictSize; - BYTE const* ip =3D (BYTE const*)(src); - BYTE const* iend =3D ip + blockSize; /* May be adjusted if we decide = to process fewer than blockSize bytes */ - repcodes_t updatedRepcodes; + const BYTE* const istart =3D (const BYTE*)(src); + const BYTE* ip =3D istart; + const BYTE* iend =3D istart + blockSize; /* May be adjusted if we dec= ide to process fewer than blockSize bytes */ + Repcodes_t updatedRepcodes; U32 bytesAdjustment =3D 0; U32 finalMatchSplit =3D 0; =20 + /* TODO(embg) support fast parsing mode in noBlockDelim mode */ + (void)externalRepSearch; + if (cctx->cdict) { dictSize =3D cctx->cdict->dictContentSize; } else if (cctx->prefixDict.dict) { @@ -5682,15 +6560,15 @@ ZSTD_copySequencesToSeqStoreNoBlockDelim(ZSTD_CCtx*= cctx, ZSTD_sequencePosition* } else { dictSize =3D 0; } - DEBUGLOG(5, "ZSTD_copySequencesToSeqStore: idx: %u PIS: %u blockSize: = %zu", idx, startPosInSequence, blockSize); + DEBUGLOG(5, "ZSTD_transferSequences_noDelim: idx: %u PIS: %u blockSize= : %zu", idx, startPosInSequence, blockSize); DEBUGLOG(5, "Start seq: idx: %u (of: %u ml: %u ll: %u)", idx, inSeqs[i= dx].offset, inSeqs[idx].matchLength, inSeqs[idx].litLength); - ZSTD_memcpy(updatedRepcodes.rep, cctx->blockState.prevCBlock->rep, siz= eof(repcodes_t)); + ZSTD_memcpy(updatedRepcodes.rep, cctx->blockState.prevCBlock->rep, siz= eof(Repcodes_t)); while (endPosInSequence && idx < inSeqsSize && !finalMatchSplit) { const ZSTD_Sequence currSeq =3D inSeqs[idx]; U32 litLength =3D currSeq.litLength; U32 matchLength =3D currSeq.matchLength; U32 const rawOffset =3D currSeq.offset; - U32 offCode; + U32 offBase; =20 /* Modify the sequence depending on where endPosInSequence lies */ if (endPosInSequence >=3D currSeq.litLength + currSeq.matchLength)= { @@ -5704,7 +6582,6 @@ ZSTD_copySequencesToSeqStoreNoBlockDelim(ZSTD_CCtx* c= ctx, ZSTD_sequencePosition* /* Move to the next sequence */ endPosInSequence -=3D currSeq.litLength + currSeq.matchLength; startPosInSequence =3D 0; - idx++; } else { /* This is the final (partial) sequence we're adding from inSe= qs, and endPosInSequence does not reach the end of the match. So, we have to split t= he sequence */ @@ -5744,58 +6621,113 @@ ZSTD_copySequencesToSeqStoreNoBlockDelim(ZSTD_CCtx= * cctx, ZSTD_sequencePosition* } /* Check if this offset can be represented with a repcode */ { U32 const ll0 =3D (litLength =3D=3D 0); - offCode =3D ZSTD_finalizeOffCode(rawOffset, updatedRepcodes.re= p, ll0); - ZSTD_updateRep(updatedRepcodes.rep, offCode, ll0); + offBase =3D ZSTD_finalizeOffBase(rawOffset, updatedRepcodes.re= p, ll0); + ZSTD_updateRep(updatedRepcodes.rep, offBase, ll0); } =20 if (cctx->appliedParams.validateSequences) { seqPos->posInSrc +=3D litLength + matchLength; - FORWARD_IF_ERROR(ZSTD_validateSequence(offCode, matchLength, s= eqPos->posInSrc, - cctx->appliedParams.cPa= rams.windowLog, dictSize), + FORWARD_IF_ERROR(ZSTD_validateSequence(offBase, matchLength, c= ctx->appliedParams.cParams.minMatch, seqPos->posInSrc, + cctx->appliedParams.cPa= rams.windowLog, dictSize, ZSTD_hasExtSeqProd(&cctx->appliedParams)), "Sequence validation fa= iled"); } - DEBUGLOG(6, "Storing sequence: (of: %u, ml: %u, ll: %u)", offCode,= matchLength, litLength); - RETURN_ERROR_IF(idx - seqPos->idx > cctx->seqStore.maxNbSeq, memor= y_allocation, + DEBUGLOG(6, "Storing sequence: (of: %u, ml: %u, ll: %u)", offBase,= matchLength, litLength); + RETURN_ERROR_IF(idx - seqPos->idx >=3D cctx->seqStore.maxNbSeq, ex= ternalSequences_invalid, "Not enough memory allocated. Try adjusting ZSTD_c= _minMatch."); - ZSTD_storeSeq(&cctx->seqStore, litLength, ip, iend, offCode, match= Length); + ZSTD_storeSeq(&cctx->seqStore, litLength, ip, iend, offBase, match= Length); ip +=3D matchLength + litLength; + if (!finalMatchSplit) + idx++; /* Next Sequence */ } DEBUGLOG(5, "Ending seq: idx: %u (of: %u ml: %u ll: %u)", idx, inSeqs[= idx].offset, inSeqs[idx].matchLength, inSeqs[idx].litLength); assert(idx =3D=3D inSeqsSize || endPosInSequence <=3D inSeqs[idx].litL= ength + inSeqs[idx].matchLength); seqPos->idx =3D idx; seqPos->posInSequence =3D endPosInSequence; - ZSTD_memcpy(cctx->blockState.nextCBlock->rep, updatedRepcodes.rep, siz= eof(repcodes_t)); + ZSTD_memcpy(cctx->blockState.nextCBlock->rep, updatedRepcodes.rep, siz= eof(Repcodes_t)); =20 iend -=3D bytesAdjustment; if (ip !=3D iend) { /* Store any last literals */ - U32 lastLLSize =3D (U32)(iend - ip); + U32 const lastLLSize =3D (U32)(iend - ip); assert(ip <=3D iend); DEBUGLOG(6, "Storing last literals of size: %u", lastLLSize); ZSTD_storeLastLiterals(&cctx->seqStore, ip, lastLLSize); seqPos->posInSrc +=3D lastLLSize; } =20 - return bytesAdjustment; + return (size_t)(iend-istart); } =20 -typedef size_t (*ZSTD_sequenceCopier) (ZSTD_CCtx* cctx, ZSTD_sequencePosit= ion* seqPos, - const ZSTD_Sequence* const inSeqs, = size_t inSeqsSize, - const void* src, size_t blockSize); -static ZSTD_sequenceCopier ZSTD_selectSequenceCopier(ZSTD_sequenceFormat_e= mode) +/* @seqPos represents a position within @inSeqs, + * it is read and updated by this function, + * once the goal to produce a block of size @blockSize is reached. + * @return: nb of bytes consumed from @src, necessarily <=3D @blockSize. + */ +typedef size_t (*ZSTD_SequenceCopier_f)(ZSTD_CCtx* cctx, + ZSTD_SequencePosition* seqPos, + const ZSTD_Sequence* const inSeqs, size_= t inSeqsSize, + const void* src, size_t blockSize, + ZSTD_ParamSwitch_e externalRepSear= ch); + +static ZSTD_SequenceCopier_f ZSTD_selectSequenceCopier(ZSTD_SequenceFormat= _e mode) { - ZSTD_sequenceCopier sequenceCopier =3D NULL; - assert(ZSTD_cParam_withinBounds(ZSTD_c_blockDelimiters, mode)); + assert(ZSTD_cParam_withinBounds(ZSTD_c_blockDelimiters, (int)mode)); if (mode =3D=3D ZSTD_sf_explicitBlockDelimiters) { - return ZSTD_copySequencesToSeqStoreExplicitBlockDelim; - } else if (mode =3D=3D ZSTD_sf_noBlockDelimiters) { - return ZSTD_copySequencesToSeqStoreNoBlockDelim; + return ZSTD_transferSequences_wBlockDelim; + } + assert(mode =3D=3D ZSTD_sf_noBlockDelimiters); + return ZSTD_transferSequences_noDelim; +} + +/* Discover the size of next block by searching for the delimiter. + * Note that a block delimiter **must** exist in this mode, + * otherwise it's an input error. + * The block size retrieved will be later compared to ensure it remains wi= thin bounds */ +static size_t +blockSize_explicitDelimiter(const ZSTD_Sequence* inSeqs, size_t inSeqsSize= , ZSTD_SequencePosition seqPos) +{ + int end =3D 0; + size_t blockSize =3D 0; + size_t spos =3D seqPos.idx; + DEBUGLOG(6, "blockSize_explicitDelimiter : seq %zu / %zu", spos, inSeq= sSize); + assert(spos <=3D inSeqsSize); + while (spos < inSeqsSize) { + end =3D (inSeqs[spos].offset =3D=3D 0); + blockSize +=3D inSeqs[spos].litLength + inSeqs[spos].matchLength; + if (end) { + if (inSeqs[spos].matchLength !=3D 0) + RETURN_ERROR(externalSequences_invalid, "delimiter format = error : both matchlength and offset must be =3D=3D 0"); + break; + } + spos++; } - assert(sequenceCopier !=3D NULL); - return sequenceCopier; + if (!end) + RETURN_ERROR(externalSequences_invalid, "Reached end of sequences = without finding a block delimiter"); + return blockSize; } =20 -/* Compress, block-by-block, all of the sequences given. +static size_t determine_blockSize(ZSTD_SequenceFormat_e mode, + size_t blockSize, size_t remaining, + const ZSTD_Sequence* inSeqs, size_t inSeqsSize, + ZSTD_SequencePosition seqPos) +{ + DEBUGLOG(6, "determine_blockSize : remainingSize =3D %zu", remaining); + if (mode =3D=3D ZSTD_sf_noBlockDelimiters) { + /* Note: more a "target" block size */ + return MIN(remaining, blockSize); + } + assert(mode =3D=3D ZSTD_sf_explicitBlockDelimiters); + { size_t const explicitBlockSize =3D blockSize_explicitDelimiter(inS= eqs, inSeqsSize, seqPos); + FORWARD_IF_ERROR(explicitBlockSize, "Error while determining block= size with explicit delimiters"); + if (explicitBlockSize > blockSize) + RETURN_ERROR(externalSequences_invalid, "sequences incorrectly= define a too large block"); + if (explicitBlockSize > remaining) + RETURN_ERROR(externalSequences_invalid, "sequences define a fr= ame longer than source"); + return explicitBlockSize; + } +} + +/* Compress all provided sequences, block-by-block. * * Returns the cumulative size of all compressed blocks (including their h= eaders), * otherwise a ZSTD error. @@ -5807,15 +6739,12 @@ ZSTD_compressSequences_internal(ZSTD_CCtx* cctx, const void* src, size_t srcSize) { size_t cSize =3D 0; - U32 lastBlock; - size_t blockSize; - size_t compressedSeqsSize; size_t remaining =3D srcSize; - ZSTD_sequencePosition seqPos =3D {0, 0, 0}; + ZSTD_SequencePosition seqPos =3D {0, 0, 0}; =20 - BYTE const* ip =3D (BYTE const*)src; + const BYTE* ip =3D (BYTE const*)src; BYTE* op =3D (BYTE*)dst; - ZSTD_sequenceCopier const sequenceCopier =3D ZSTD_selectSequenceCopier= (cctx->appliedParams.blockDelimiters); + ZSTD_SequenceCopier_f const sequenceCopier =3D ZSTD_selectSequenceCopi= er(cctx->appliedParams.blockDelimiters); =20 DEBUGLOG(4, "ZSTD_compressSequences_internal srcSize: %zu, inSeqsSize:= %zu", srcSize, inSeqsSize); /* Special case: empty frame */ @@ -5829,22 +6758,29 @@ ZSTD_compressSequences_internal(ZSTD_CCtx* cctx, } =20 while (remaining) { + size_t compressedSeqsSize; size_t cBlockSize; - size_t additionalByteAdjustment; - lastBlock =3D remaining <=3D cctx->blockSize; - blockSize =3D lastBlock ? (U32)remaining : (U32)cctx->blockSize; + size_t blockSize =3D determine_blockSize(cctx->appliedParams.block= Delimiters, + cctx->blockSizeMax, remaining, + inSeqs, inSeqsSize, seqPos); + U32 const lastBlock =3D (blockSize =3D=3D remaining); + FORWARD_IF_ERROR(blockSize, "Error while trying to determine block= size"); + assert(blockSize <=3D remaining); ZSTD_resetSeqStore(&cctx->seqStore); - DEBUGLOG(4, "Working on new block. Blocksize: %zu", blockSize); =20 - additionalByteAdjustment =3D sequenceCopier(cctx, &seqPos, inSeqs,= inSeqsSize, ip, blockSize); - FORWARD_IF_ERROR(additionalByteAdjustment, "Bad sequence copy"); - blockSize -=3D additionalByteAdjustment; + blockSize =3D sequenceCopier(cctx, + &seqPos, inSeqs, inSeqsSize, + ip, blockSize, + cctx->appliedParams.searchForExternalRe= pcodes); + FORWARD_IF_ERROR(blockSize, "Bad sequence copy"); =20 /* If blocks are too small, emit as a nocompress block */ - if (blockSize < MIN_CBLOCK_SIZE+ZSTD_blockHeaderSize+1) { + /* TODO: See 3090. We reduced MIN_CBLOCK_SIZE from 3 to 2 so to co= mpensate we are adding + * additional 1. We need to revisit and change this logic to be mo= re consistent */ + if (blockSize < MIN_CBLOCK_SIZE+ZSTD_blockHeaderSize+1+1) { cBlockSize =3D ZSTD_noCompressBlock(op, dstCapacity, ip, block= Size, lastBlock); FORWARD_IF_ERROR(cBlockSize, "Nocompress block failed"); - DEBUGLOG(4, "Block too small, writing out nocompress block: cS= ize: %zu", cBlockSize); + DEBUGLOG(5, "Block too small (%zu): data remains uncompressed:= cSize=3D%zu", blockSize, cBlockSize); cSize +=3D cBlockSize; ip +=3D blockSize; op +=3D cBlockSize; @@ -5853,35 +6789,36 @@ ZSTD_compressSequences_internal(ZSTD_CCtx* cctx, continue; } =20 + RETURN_ERROR_IF(dstCapacity < ZSTD_blockHeaderSize, dstSize_tooSma= ll, "not enough dstCapacity to write a new compressed block"); compressedSeqsSize =3D ZSTD_entropyCompressSeqStore(&cctx->seqStor= e, &cctx->blockState.prevCBlock->entropy, &cc= tx->blockState.nextCBlock->entropy, &cctx->appliedParams, op + ZSTD_blockHeaderSize /* Leave space f= or block header */, dstCapacity - ZSTD_blockHeaderSize, blockSize, - cctx->entropyWorkspace, ENTROPY_WORKSPACE_= SIZE /* statically allocated in resetCCtx */, + cctx->tmpWorkspace, cctx->tmpWkspSize /* s= tatically allocated in resetCCtx */, cctx->bmi2); FORWARD_IF_ERROR(compressedSeqsSize, "Compressing sequences of blo= ck failed"); - DEBUGLOG(4, "Compressed sequences size: %zu", compressedSeqsSize); + DEBUGLOG(5, "Compressed sequences size: %zu", compressedSeqsSize); =20 if (!cctx->isFirstBlock && ZSTD_maybeRLE(&cctx->seqStore) && - ZSTD_isRLE((BYTE const*)src, srcSize)) { - /* We don't want to emit our first block as a RLE even if it q= ualifies because - * doing so will cause the decoder (cli only) to throw a "shoul= d consume all input error." - * This is only an issue for zstd <=3D v1.4.3 - */ + ZSTD_isRLE(ip, blockSize)) { + /* Note: don't emit the first block as RLE even if it qualifie= s because + * doing so will cause the decoder (cli <=3D v1.4.3 only) to t= hrow an (invalid) error + * "should consume all input error." + */ compressedSeqsSize =3D 1; } =20 if (compressedSeqsSize =3D=3D 0) { /* ZSTD_noCompressBlock writes the block header as well */ cBlockSize =3D ZSTD_noCompressBlock(op, dstCapacity, ip, block= Size, lastBlock); - FORWARD_IF_ERROR(cBlockSize, "Nocompress block failed"); - DEBUGLOG(4, "Writing out nocompress block, size: %zu", cBlockS= ize); + FORWARD_IF_ERROR(cBlockSize, "ZSTD_noCompressBlock failed"); + DEBUGLOG(5, "Writing out nocompress block, size: %zu", cBlockS= ize); } else if (compressedSeqsSize =3D=3D 1) { cBlockSize =3D ZSTD_rleCompressBlock(op, dstCapacity, *ip, blo= ckSize, lastBlock); - FORWARD_IF_ERROR(cBlockSize, "RLE compress block failed"); - DEBUGLOG(4, "Writing out RLE block, size: %zu", cBlockSize); + FORWARD_IF_ERROR(cBlockSize, "ZSTD_rleCompressBlock failed"); + DEBUGLOG(5, "Writing out RLE block, size: %zu", cBlockSize); } else { U32 cBlockHeader; /* Error checking and repcodes update */ @@ -5893,11 +6830,10 @@ ZSTD_compressSequences_internal(ZSTD_CCtx* cctx, cBlockHeader =3D lastBlock + (((U32)bt_compressed)<<1) + (U32)= (compressedSeqsSize << 3); MEM_writeLE24(op, cBlockHeader); cBlockSize =3D ZSTD_blockHeaderSize + compressedSeqsSize; - DEBUGLOG(4, "Writing out compressed block, size: %zu", cBlockS= ize); + DEBUGLOG(5, "Writing out compressed block, size: %zu", cBlockS= ize); } =20 cSize +=3D cBlockSize; - DEBUGLOG(4, "cSize running total: %zu", cSize); =20 if (lastBlock) { break; @@ -5908,41 +6844,50 @@ ZSTD_compressSequences_internal(ZSTD_CCtx* cctx, dstCapacity -=3D cBlockSize; cctx->isFirstBlock =3D 0; } + DEBUGLOG(5, "cSize running total: %zu (remaining dstCapacity=3D%zu= )", cSize, dstCapacity); } =20 + DEBUGLOG(4, "cSize final total: %zu", cSize); return cSize; } =20 -size_t ZSTD_compressSequences(ZSTD_CCtx* const cctx, void* dst, size_t dst= Capacity, +size_t ZSTD_compressSequences(ZSTD_CCtx* cctx, + void* dst, size_t dstCapacity, const ZSTD_Sequence* inSeqs, size_t inSeqsSi= ze, const void* src, size_t srcSize) { BYTE* op =3D (BYTE*)dst; size_t cSize =3D 0; - size_t compressedBlocksSize =3D 0; - size_t frameHeaderSize =3D 0; =20 /* Transparent initialization stage, same as compressStream2() */ - DEBUGLOG(3, "ZSTD_compressSequences()"); + DEBUGLOG(4, "ZSTD_compressSequences (nbSeqs=3D%zu,dstCapacity=3D%zu)",= inSeqsSize, dstCapacity); assert(cctx !=3D NULL); FORWARD_IF_ERROR(ZSTD_CCtx_init_compressStream2(cctx, ZSTD_e_end, srcS= ize), "CCtx initialization failed"); + /* Begin writing output, starting with frame header */ - frameHeaderSize =3D ZSTD_writeFrameHeader(op, dstCapacity, &cctx->appl= iedParams, srcSize, cctx->dictID); - op +=3D frameHeaderSize; - dstCapacity -=3D frameHeaderSize; - cSize +=3D frameHeaderSize; + { size_t const frameHeaderSize =3D ZSTD_writeFrameHeader(op, dstCapa= city, + &cctx->appliedParams, srcSize, cctx->dictID); + op +=3D frameHeaderSize; + assert(frameHeaderSize <=3D dstCapacity); + dstCapacity -=3D frameHeaderSize; + cSize +=3D frameHeaderSize; + } if (cctx->appliedParams.fParams.checksumFlag && srcSize) { xxh64_update(&cctx->xxhState, src, srcSize); } - /* cSize includes block header size and compressed sequences size */ - compressedBlocksSize =3D ZSTD_compressSequences_internal(cctx, + + /* Now generate compressed blocks */ + { size_t const cBlocksSize =3D ZSTD_compressSequences_internal(cctx, op, dstCapacity, inSeqs, inSeqsS= ize, src, srcSize); - FORWARD_IF_ERROR(compressedBlocksSize, "Compressing blocks failed!"); - cSize +=3D compressedBlocksSize; - dstCapacity -=3D compressedBlocksSize; + FORWARD_IF_ERROR(cBlocksSize, "Compressing blocks failed!"); + cSize +=3D cBlocksSize; + assert(cBlocksSize <=3D dstCapacity); + dstCapacity -=3D cBlocksSize; + } =20 + /* Complete with frame checksum, if needed */ if (cctx->appliedParams.fParams.checksumFlag) { U32 const checksum =3D (U32) xxh64_digest(&cctx->xxhState); RETURN_ERROR_IF(dstCapacity<4, dstSize_tooSmall, "no room for chec= ksum"); @@ -5951,26 +6896,557 @@ size_t ZSTD_compressSequences(ZSTD_CCtx* const cct= x, void* dst, size_t dstCapaci cSize +=3D 4; } =20 - DEBUGLOG(3, "Final compressed size: %zu", cSize); + DEBUGLOG(4, "Final compressed size: %zu", cSize); + return cSize; +} + + +#if defined(__AVX2__) + +#include /* AVX2 intrinsics */ + +/* + * Convert 2 sequences per iteration, using AVX2 intrinsics: + * - offset -> offBase =3D offset + 2 + * - litLength -> (U16) litLength + * - matchLength -> (U16)(matchLength - 3) + * - rep is ignored + * Store only 8 bytes per SeqDef (offBase[4], litLength[2], mlBase[2]). + * + * At the end, instead of extracting two __m128i, + * we use _mm256_permute4x64_epi64(..., 0xE8) to move lane2 into lane1, + * then store the lower 16 bytes in one go. + * + * @returns 0 on succes, with no long length detected + * @returns > 0 if there is one long length (> 65535), + * indicating the position, and type. + */ +static size_t convertSequences_noRepcodes( + SeqDef* dstSeqs, + const ZSTD_Sequence* inSeqs, + size_t nbSequences) +{ + /* + * addition: + * For each 128-bit half: (offset+2, litLength+0, matchLength-3, rep= +0) + */ + const __m256i addition =3D _mm256_setr_epi32( + ZSTD_REP_NUM, 0, -MINMATCH, 0, /* for sequence i */ + ZSTD_REP_NUM, 0, -MINMATCH, 0 /* for sequence i+1 */ + ); + + /* limit: check if there is a long length */ + const __m256i limit =3D _mm256_set1_epi32(65535); + + /* + * shuffle mask for byte-level rearrangement in each 128-bit half: + * + * Input layout (after addition) per 128-bit half: + * [ offset+2 (4 bytes) | litLength (4 bytes) | matchLength (4 bytes= ) | rep (4 bytes) ] + * We only need: + * offBase (4 bytes) =3D offset+2 + * litLength (2 bytes) =3D low 2 bytes of litLength + * mlBase (2 bytes) =3D low 2 bytes of (matchLength) + * =3D> Bytes [0..3, 4..5, 8..9], zero the rest. + */ + const __m256i mask =3D _mm256_setr_epi8( + /* For the lower 128 bits =3D> sequence i */ + 0, 1, 2, 3, /* offset+2 */ + 4, 5, /* litLength (16 bits) */ + 8, 9, /* matchLength (16 bits) */ + (BYTE)0x80, (BYTE)0x80, (BYTE)0x80, (BYTE)0x80, + (BYTE)0x80, (BYTE)0x80, (BYTE)0x80, (BYTE)0x80, + + /* For the upper 128 bits =3D> sequence i+1 */ + 16,17,18,19, /* offset+2 */ + 20,21, /* litLength */ + 24,25, /* matchLength */ + (BYTE)0x80, (BYTE)0x80, (BYTE)0x80, (BYTE)0x80, + (BYTE)0x80, (BYTE)0x80, (BYTE)0x80, (BYTE)0x80 + ); + + /* + * Next, we'll use _mm256_permute4x64_epi64(vshf, 0xE8). + * Explanation of 0xE8 =3D 11101000b =3D> [lane0, lane2, lane2, lane3]. + * So the lower 128 bits become [lane0, lane2] =3D> combining seq0 and= seq1. + */ +#define PERM_LANE_0X_E8 0xE8 /* [0,2,2,3] in lane indices */ + + size_t longLen =3D 0, i =3D 0; + + /* AVX permutation depends on the specific definition of target struct= ures */ + ZSTD_STATIC_ASSERT(sizeof(ZSTD_Sequence) =3D=3D 16); + ZSTD_STATIC_ASSERT(offsetof(ZSTD_Sequence, offset) =3D=3D 0); + ZSTD_STATIC_ASSERT(offsetof(ZSTD_Sequence, litLength) =3D=3D 4); + ZSTD_STATIC_ASSERT(offsetof(ZSTD_Sequence, matchLength) =3D=3D 8); + ZSTD_STATIC_ASSERT(sizeof(SeqDef) =3D=3D 8); + ZSTD_STATIC_ASSERT(offsetof(SeqDef, offBase) =3D=3D 0); + ZSTD_STATIC_ASSERT(offsetof(SeqDef, litLength) =3D=3D 4); + ZSTD_STATIC_ASSERT(offsetof(SeqDef, mlBase) =3D=3D 6); + + /* Process 2 sequences per loop iteration */ + for (; i + 1 < nbSequences; i +=3D 2) { + /* Load 2 ZSTD_Sequence (32 bytes) */ + __m256i vin =3D _mm256_loadu_si256((const __m256i*)(const void*)&= inSeqs[i]); + + /* Add {2, 0, -3, 0} in each 128-bit half */ + __m256i vadd =3D _mm256_add_epi32(vin, addition); + + /* Check for long length */ + __m256i ll_cmp =3D _mm256_cmpgt_epi32(vadd, limit); /* 0xFFFFFFF= F for element > 65535 */ + int ll_res =3D _mm256_movemask_epi8(ll_cmp); + + /* Shuffle bytes so each half gives us the 8 bytes we need */ + __m256i vshf =3D _mm256_shuffle_epi8(vadd, mask); + /* + * Now: + * Lane0 =3D seq0's 8 bytes + * Lane1 =3D 0 + * Lane2 =3D seq1's 8 bytes + * Lane3 =3D 0 + */ + + /* Permute 64-bit lanes =3D> move Lane2 down into Lane1. */ + __m256i vperm =3D _mm256_permute4x64_epi64(vshf, PERM_LANE_0X_E8); + /* + * Now the lower 16 bytes (Lane0+Lane1) =3D [seq0, seq1]. + * The upper 16 bytes are [Lane2, Lane3] =3D [seq1, 0], but we won= 't use them. + */ + + /* Store only the lower 16 bytes =3D> 2 SeqDef (8 bytes each) */ + _mm_storeu_si128((__m128i *)(void*)&dstSeqs[i], _mm256_castsi256_s= i128(vperm)); + /* + * This writes out 16 bytes total: + * - offset 0..7 =3D> seq0 (offBase, litLength, mlBase) + * - offset 8..15 =3D> seq1 (offBase, litLength, mlBase) + */ + + /* check (unlikely) long lengths > 65535 + * indices for lengths correspond to bits [4..7], [8..11], [20..23= ], [24..27] + * =3D> combined mask =3D 0x0FF00FF0 + */ + if (UNLIKELY((ll_res & 0x0FF00FF0) !=3D 0)) { + /* long length detected: let's figure out which one*/ + if (inSeqs[i].matchLength > 65535+MINMATCH) { + assert(longLen =3D=3D 0); + longLen =3D i + 1; + } + if (inSeqs[i].litLength > 65535) { + assert(longLen =3D=3D 0); + longLen =3D i + nbSequences + 1; + } + if (inSeqs[i+1].matchLength > 65535+MINMATCH) { + assert(longLen =3D=3D 0); + longLen =3D i + 1 + 1; + } + if (inSeqs[i+1].litLength > 65535) { + assert(longLen =3D=3D 0); + longLen =3D i + 1 + nbSequences + 1; + } + } + } + + /* Handle leftover if @nbSequences is odd */ + if (i < nbSequences) { + /* process last sequence */ + assert(i =3D=3D nbSequences - 1); + dstSeqs[i].offBase =3D OFFSET_TO_OFFBASE(inSeqs[i].offset); + dstSeqs[i].litLength =3D (U16)inSeqs[i].litLength; + dstSeqs[i].mlBase =3D (U16)(inSeqs[i].matchLength - MINMATCH); + /* check (unlikely) long lengths > 65535 */ + if (UNLIKELY(inSeqs[i].matchLength > 65535+MINMATCH)) { + assert(longLen =3D=3D 0); + longLen =3D i + 1; + } + if (UNLIKELY(inSeqs[i].litLength > 65535)) { + assert(longLen =3D=3D 0); + longLen =3D i + nbSequences + 1; + } + } + + return longLen; +} + +/* the vector implementation could also be ported to SSSE3, + * but since this implementation is targeting modern systems (>=3D Sapphir= e Rapid), + * it's not useful to develop and maintain code for older pre-AVX2 platfor= ms */ + +#else /* no AVX2 */ + +static size_t convertSequences_noRepcodes( + SeqDef* dstSeqs, + const ZSTD_Sequence* inSeqs, + size_t nbSequences) +{ + size_t longLen =3D 0; + size_t n; + for (n=3D0; n 65535 */ + if (UNLIKELY(inSeqs[n].matchLength > 65535+MINMATCH)) { + assert(longLen =3D=3D 0); + longLen =3D n + 1; + } + if (UNLIKELY(inSeqs[n].litLength > 65535)) { + assert(longLen =3D=3D 0); + longLen =3D n + nbSequences + 1; + } + } + return longLen; +} + +#endif + +/* + * Precondition: Sequences must end on an explicit Block Delimiter + * @return: 0 on success, or an error code. + * Note: Sequence validation functionality has been disabled (removed). + * This is helpful to generate a lean main pipeline, improving performance. + * It may be re-inserted later. + */ +size_t ZSTD_convertBlockSequences(ZSTD_CCtx* cctx, + const ZSTD_Sequence* const inSeqs, size_t nbSequences, + int repcodeResolution) +{ + Repcodes_t updatedRepcodes; + size_t seqNb =3D 0; + + DEBUGLOG(5, "ZSTD_convertBlockSequences (nbSequences =3D %zu)", nbSequ= ences); + + RETURN_ERROR_IF(nbSequences >=3D cctx->seqStore.maxNbSeq, externalSequ= ences_invalid, + "Not enough memory allocated. Try adjusting ZSTD_c_min= Match."); + + ZSTD_memcpy(updatedRepcodes.rep, cctx->blockState.prevCBlock->rep, siz= eof(Repcodes_t)); + + /* check end condition */ + assert(nbSequences >=3D 1); + assert(inSeqs[nbSequences-1].matchLength =3D=3D 0); + assert(inSeqs[nbSequences-1].offset =3D=3D 0); + + /* Convert Sequences from public format to internal format */ + if (!repcodeResolution) { + size_t const longl =3D convertSequences_noRepcodes(cctx->seqStore.= sequencesStart, inSeqs, nbSequences-1); + cctx->seqStore.sequences =3D cctx->seqStore.sequencesStart + nbSeq= uences-1; + if (longl) { + DEBUGLOG(5, "long length"); + assert(cctx->seqStore.longLengthType =3D=3D ZSTD_llt_none); + if (longl <=3D nbSequences-1) { + DEBUGLOG(5, "long match length detected at pos %zu", longl= -1); + cctx->seqStore.longLengthType =3D ZSTD_llt_matchLength; + cctx->seqStore.longLengthPos =3D (U32)(longl-1); + } else { + DEBUGLOG(5, "long literals length detected at pos %zu", lo= ngl-nbSequences); + assert(longl <=3D 2* (nbSequences-1)); + cctx->seqStore.longLengthType =3D ZSTD_llt_literalLength; + cctx->seqStore.longLengthPos =3D (U32)(longl-(nbSequences-= 1)-1); + } + } + } else { + for (seqNb =3D 0; seqNb < nbSequences - 1 ; seqNb++) { + U32 const litLength =3D inSeqs[seqNb].litLength; + U32 const matchLength =3D inSeqs[seqNb].matchLength; + U32 const ll0 =3D (litLength =3D=3D 0); + U32 const offBase =3D ZSTD_finalizeOffBase(inSeqs[seqNb].offse= t, updatedRepcodes.rep, ll0); + + DEBUGLOG(6, "Storing sequence: (of: %u, ml: %u, ll: %u)", offB= ase, matchLength, litLength); + ZSTD_storeSeqOnly(&cctx->seqStore, litLength, offBase, matchLe= ngth); + ZSTD_updateRep(updatedRepcodes.rep, offBase, ll0); + } + } + + /* If we skipped repcode search while parsing, we need to update repco= des now */ + if (!repcodeResolution && nbSequences > 1) { + U32* const rep =3D updatedRepcodes.rep; + + if (nbSequences >=3D 4) { + U32 lastSeqIdx =3D (U32)nbSequences - 2; /* index of last full= sequence */ + rep[2] =3D inSeqs[lastSeqIdx - 2].offset; + rep[1] =3D inSeqs[lastSeqIdx - 1].offset; + rep[0] =3D inSeqs[lastSeqIdx].offset; + } else if (nbSequences =3D=3D 3) { + rep[2] =3D rep[0]; + rep[1] =3D inSeqs[0].offset; + rep[0] =3D inSeqs[1].offset; + } else { + assert(nbSequences =3D=3D 2); + rep[2] =3D rep[1]; + rep[1] =3D rep[0]; + rep[0] =3D inSeqs[0].offset; + } + } + + ZSTD_memcpy(cctx->blockState.nextCBlock->rep, updatedRepcodes.rep, siz= eof(Repcodes_t)); + + return 0; +} + +#if defined(ZSTD_ARCH_X86_AVX2) + +BlockSummary ZSTD_get1BlockSummary(const ZSTD_Sequence* seqs, size_t nbSeq= s) +{ + size_t i; + __m256i const zeroVec =3D _mm256_setzero_si256(); + __m256i sumVec =3D zeroVec; /* accumulates match+lit in 32-bit lanes = */ + ZSTD_ALIGNED(32) U32 tmp[8]; /* temporary buffer for reduction */ + size_t mSum =3D 0, lSum =3D 0; + ZSTD_STATIC_ASSERT(sizeof(ZSTD_Sequence) =3D=3D 16); + + /* Process 2 structs (32 bytes) at a time */ + for (i =3D 0; i + 2 <=3D nbSeqs; i +=3D 2) { + /* Load two consecutive ZSTD_Sequence (8=C3=974 =3D 32 bytes) */ + __m256i data =3D _mm256_loadu_si256((const __m256i*)(const voi= d*)&seqs[i]); + /* check end of block signal */ + __m256i cmp =3D _mm256_cmpeq_epi32(data, zeroVec); + int cmp_res =3D _mm256_movemask_epi8(cmp); + /* indices for match lengths correspond to bits [8..11], [24..27] + * =3D> combined mask =3D 0x0F000F00 */ + ZSTD_STATIC_ASSERT(offsetof(ZSTD_Sequence, matchLength) =3D=3D 8); + if (cmp_res & 0x0F000F00) break; + /* Accumulate in sumVec */ + sumVec =3D _mm256_add_epi32(sumVec, data); + } + + /* Horizontal reduction */ + _mm256_store_si256((__m256i*)tmp, sumVec); + lSum =3D tmp[1] + tmp[5]; + mSum =3D tmp[2] + tmp[6]; + + /* Handle the leftover */ + for (; i < nbSeqs; i++) { + lSum +=3D seqs[i].litLength; + mSum +=3D seqs[i].matchLength; + if (seqs[i].matchLength =3D=3D 0) break; /* end of block */ + } + + if (i=3D=3DnbSeqs) { + /* reaching end of sequences: end of block signal was not present = */ + BlockSummary bs; + bs.nbSequences =3D ERROR(externalSequences_invalid); + return bs; + } + { BlockSummary bs; + bs.nbSequences =3D i+1; + bs.blockSize =3D lSum + mSum; + bs.litSize =3D lSum; + return bs; + } +} + +#else + +BlockSummary ZSTD_get1BlockSummary(const ZSTD_Sequence* seqs, size_t nbSeq= s) +{ + size_t totalMatchSize =3D 0; + size_t litSize =3D 0; + size_t n; + assert(seqs); + for (n=3D0; nappliedParams.searchForExternal= Repcodes =3D=3D ZSTD_ps_enable); + assert(cctx->appliedParams.searchForExternalRepcodes !=3D ZSTD_ps_auto= ); + + DEBUGLOG(4, "ZSTD_compressSequencesAndLiterals_internal: nbSeqs=3D%zu,= litSize=3D%zu", nbSequences, litSize); + RETURN_ERROR_IF(nbSequences =3D=3D 0, externalSequences_invalid, "Requ= ires at least 1 end-of-block"); + + /* Special case: empty frame */ + if ((nbSequences =3D=3D 1) && (inSeqs[0].litLength =3D=3D 0)) { + U32 const cBlockHeader24 =3D 1 /* last block */ + (((U32)bt_raw)<<= 1); + RETURN_ERROR_IF(dstCapacity<3, dstSize_tooSmall, "No room for empt= y frame block header"); + MEM_writeLE24(op, cBlockHeader24); + op +=3D ZSTD_blockHeaderSize; + dstCapacity -=3D ZSTD_blockHeaderSize; + cSize +=3D ZSTD_blockHeaderSize; + } + + while (nbSequences) { + size_t compressedSeqsSize, cBlockSize, conversionStatus; + BlockSummary const block =3D ZSTD_get1BlockSummary(inSeqs, nbSeque= nces); + U32 const lastBlock =3D (block.nbSequences =3D=3D nbSequences); + FORWARD_IF_ERROR(block.nbSequences, "Error while trying to determi= ne nb of sequences for a block"); + assert(block.nbSequences <=3D nbSequences); + RETURN_ERROR_IF(block.litSize > litSize, externalSequences_invalid= , "discrepancy: Sequences require more literals than present in buffer"); + ZSTD_resetSeqStore(&cctx->seqStore); + + conversionStatus =3D ZSTD_convertBlockSequences(cctx, + inSeqs, block.nbSequences, + repcodeResolution); + FORWARD_IF_ERROR(conversionStatus, "Bad sequence conversion"); + inSeqs +=3D block.nbSequences; + nbSequences -=3D block.nbSequences; + remaining -=3D block.blockSize; + + /* Note: when blockSize is very small, other variant send it uncom= pressed. + * Here, we still send the sequences, because we don't have the or= iginal source to send it uncompressed. + * One could imagine in theory reproducing the source from the seq= uences, + * but that's complex and costly memory intensive, and goes agains= t the objectives of this variant. */ + + RETURN_ERROR_IF(dstCapacity < ZSTD_blockHeaderSize, dstSize_tooSma= ll, "not enough dstCapacity to write a new compressed block"); + + compressedSeqsSize =3D ZSTD_entropyCompressSeqStore_internal( + op + ZSTD_blockHeaderSize /* Leave space f= or block header */, dstCapacity - ZSTD_blockHeaderSize, + literals, block.litSize, + &cctx->seqStore, + &cctx->blockState.prevCBlock->entropy, &cc= tx->blockState.nextCBlock->entropy, + &cctx->appliedParams, + cctx->tmpWorkspace, cctx->tmpWkspSize /* s= tatically allocated in resetCCtx */, + cctx->bmi2); + FORWARD_IF_ERROR(compressedSeqsSize, "Compressing sequences of blo= ck failed"); + /* note: the spec forbids for any compressed block to be larger th= an maximum block size */ + if (compressedSeqsSize > cctx->blockSizeMax) compressedSeqsSize = =3D 0; + DEBUGLOG(5, "Compressed sequences size: %zu", compressedSeqsSize); + litSize -=3D block.litSize; + literals =3D (const char*)literals + block.litSize; + + /* Note: difficult to check source for RLE block when only Literal= s are provided, + * but it could be considered from analyzing the sequence directly= */ + + if (compressedSeqsSize =3D=3D 0) { + /* Sending uncompressed blocks is out of reach, because the so= urce is not provided. + * In theory, one could use the sequences to regenerate the so= urce, like a decompressor, + * but it's complex, and memory hungry, killing the purpose of= this variant. + * Current outcome: generate an error code. + */ + RETURN_ERROR(cannotProduce_uncompressedBlock, "ZSTD_compressSe= quencesAndLiterals cannot generate an uncompressed block"); + } else { + U32 cBlockHeader; + assert(compressedSeqsSize > 1); /* no RLE */ + /* Error checking and repcodes update */ + ZSTD_blockState_confirmRepcodesAndEntropyTables(&cctx->blockSt= ate); + if (cctx->blockState.prevCBlock->entropy.fse.offcode_repeatMod= e =3D=3D FSE_repeat_valid) + cctx->blockState.prevCBlock->entropy.fse.offcode_repeatMod= e =3D FSE_repeat_check; + + /* Write block header into beginning of block*/ + cBlockHeader =3D lastBlock + (((U32)bt_compressed)<<1) + (U32)= (compressedSeqsSize << 3); + MEM_writeLE24(op, cBlockHeader); + cBlockSize =3D ZSTD_blockHeaderSize + compressedSeqsSize; + DEBUGLOG(5, "Writing out compressed block, size: %zu", cBlockS= ize); + } + + cSize +=3D cBlockSize; + op +=3D cBlockSize; + dstCapacity -=3D cBlockSize; + cctx->isFirstBlock =3D 0; + DEBUGLOG(5, "cSize running total: %zu (remaining dstCapacity=3D%zu= )", cSize, dstCapacity); + + if (lastBlock) { + assert(nbSequences =3D=3D 0); + break; + } + } + + RETURN_ERROR_IF(litSize !=3D 0, externalSequences_invalid, "literals m= ust be entirely and exactly consumed"); + RETURN_ERROR_IF(remaining !=3D 0, externalSequences_invalid, "Sequence= s must represent a total of exactly srcSize=3D%zu", srcSize); + DEBUGLOG(4, "cSize final total: %zu", cSize); + return cSize; +} + +size_t +ZSTD_compressSequencesAndLiterals(ZSTD_CCtx* cctx, + void* dst, size_t dstCapacity, + const ZSTD_Sequence* inSeqs, size_t inSeqsSize, + const void* literals, size_t litSize, size_t litCapaci= ty, + size_t decompressedSize) +{ + BYTE* op =3D (BYTE*)dst; + size_t cSize =3D 0; + + /* Transparent initialization stage, same as compressStream2() */ + DEBUGLOG(4, "ZSTD_compressSequencesAndLiterals (dstCapacity=3D%zu)", d= stCapacity); + assert(cctx !=3D NULL); + if (litCapacity < litSize) { + RETURN_ERROR(workSpace_tooSmall, "literals buffer is not large eno= ugh: must be at least 8 bytes larger than litSize (risk of read out-of-boun= d)"); + } + FORWARD_IF_ERROR(ZSTD_CCtx_init_compressStream2(cctx, ZSTD_e_end, deco= mpressedSize), "CCtx initialization failed"); + + if (cctx->appliedParams.blockDelimiters =3D=3D ZSTD_sf_noBlockDelimite= rs) { + RETURN_ERROR(frameParameter_unsupported, "This mode is only compat= ible with explicit delimiters"); + } + if (cctx->appliedParams.validateSequences) { + RETURN_ERROR(parameter_unsupported, "This mode is not compatible w= ith Sequence validation"); + } + if (cctx->appliedParams.fParams.checksumFlag) { + RETURN_ERROR(frameParameter_unsupported, "this mode is not compati= ble with frame checksum"); + } + + /* Begin writing output, starting with frame header */ + { size_t const frameHeaderSize =3D ZSTD_writeFrameHeader(op, dstCapa= city, + &cctx->appliedParams, decompressedSize, cctx->dictID); + op +=3D frameHeaderSize; + assert(frameHeaderSize <=3D dstCapacity); + dstCapacity -=3D frameHeaderSize; + cSize +=3D frameHeaderSize; + } + + /* Now generate compressed blocks */ + { size_t const cBlocksSize =3D ZSTD_compressSequencesAndLiterals_int= ernal(cctx, + op, dstCapacity, + inSeqs, inSeqsSize, + literals, litSize, decompresse= dSize); + FORWARD_IF_ERROR(cBlocksSize, "Compressing blocks failed!"); + cSize +=3D cBlocksSize; + assert(cBlocksSize <=3D dstCapacity); + dstCapacity -=3D cBlocksSize; + } + + DEBUGLOG(4, "Final compressed size: %zu", cSize); return cSize; } =20 /*=3D=3D=3D=3D=3D=3D Finalize =3D=3D=3D=3D=3D=3D*/ =20 +static ZSTD_inBuffer inBuffer_forEndFlush(const ZSTD_CStream* zcs) +{ + const ZSTD_inBuffer nullInput =3D { NULL, 0, 0 }; + const int stableInput =3D (zcs->appliedParams.inBufferMode =3D=3D ZSTD= _bm_stable); + return stableInput ? zcs->expectedInBuffer : nullInput; +} + /*! ZSTD_flushStream() : * @return : amount of data remaining to flush */ size_t ZSTD_flushStream(ZSTD_CStream* zcs, ZSTD_outBuffer* output) { - ZSTD_inBuffer input =3D { NULL, 0, 0 }; + ZSTD_inBuffer input =3D inBuffer_forEndFlush(zcs); + input.size =3D input.pos; /* do not ingest more input during flush */ return ZSTD_compressStream2(zcs, output, &input, ZSTD_e_flush); } =20 - size_t ZSTD_endStream(ZSTD_CStream* zcs, ZSTD_outBuffer* output) { - ZSTD_inBuffer input =3D { NULL, 0, 0 }; + ZSTD_inBuffer input =3D inBuffer_forEndFlush(zcs); size_t const remainingToFlush =3D ZSTD_compressStream2(zcs, output, &i= nput, ZSTD_e_end); - FORWARD_IF_ERROR( remainingToFlush , "ZSTD_compressStream2 failed"); + FORWARD_IF_ERROR(remainingToFlush , "ZSTD_compressStream2(,,ZSTD_e_end= ) failed"); if (zcs->appliedParams.nbWorkers > 0) return remainingToFlush; /* mi= nimal estimation */ /* single thread mode : attempt to calculate remaining to flush more p= recisely */ { size_t const lastBlockSize =3D zcs->frameEnded ? 0 : ZSTD_BLOCKHEA= DERSIZE; @@ -6046,7 +7522,7 @@ static void ZSTD_dedicatedDictSearch_revertCParams( } } =20 -static U64 ZSTD_getCParamRowSize(U64 srcSizeHint, size_t dictSize, ZSTD_cP= aramMode_e mode) +static U64 ZSTD_getCParamRowSize(U64 srcSizeHint, size_t dictSize, ZSTD_CP= aramMode_e mode) { switch (mode) { case ZSTD_cpm_unknown: @@ -6070,8 +7546,8 @@ static U64 ZSTD_getCParamRowSize(U64 srcSizeHint, siz= e_t dictSize, ZSTD_cParamMo * @return ZSTD_compressionParameters structure for a selected compression= level, srcSize and dictSize. * Note: srcSizeHint 0 means 0, use ZSTD_CONTENTSIZE_UNKNOWN for unknown. * Use dictSize =3D=3D 0 for unknown or unused. - * Note: `mode` controls how we treat the `dictSize`. See docs for `ZSTD_= cParamMode_e`. */ -static ZSTD_compressionParameters ZSTD_getCParams_internal(int compression= Level, unsigned long long srcSizeHint, size_t dictSize, ZSTD_cParamMode_e m= ode) + * Note: `mode` controls how we treat the `dictSize`. See docs for `ZSTD_= CParamMode_e`. */ +static ZSTD_compressionParameters ZSTD_getCParams_internal(int compression= Level, unsigned long long srcSizeHint, size_t dictSize, ZSTD_CParamMode_e m= ode) { U64 const rSize =3D ZSTD_getCParamRowSize(srcSizeHint, dictSize, mode); U32 const tableID =3D (rSize <=3D 256 KB) + (rSize <=3D 128 KB) + (rSi= ze <=3D 16 KB); @@ -6092,7 +7568,7 @@ static ZSTD_compressionParameters ZSTD_getCParams_int= ernal(int compressionLevel, cp.targetLength =3D (unsigned)(-clampedCompressionLevel); } /* refine parameters based on srcSize & dictSize */ - return ZSTD_adjustCParams_internal(cp, srcSizeHint, dictSize, mode= ); + return ZSTD_adjustCParams_internal(cp, srcSizeHint, dictSize, mode= , ZSTD_ps_auto); } } =20 @@ -6109,7 +7585,9 @@ ZSTD_compressionParameters ZSTD_getCParams(int compre= ssionLevel, unsigned long l * same idea as ZSTD_getCParams() * @return a `ZSTD_parameters` structure (instead of `ZSTD_compressionPara= meters`). * Fields of `ZSTD_frameParameters` are set to default values */ -static ZSTD_parameters ZSTD_getParams_internal(int compressionLevel, unsig= ned long long srcSizeHint, size_t dictSize, ZSTD_cParamMode_e mode) { +static ZSTD_parameters +ZSTD_getParams_internal(int compressionLevel, unsigned long long srcSizeHi= nt, size_t dictSize, ZSTD_CParamMode_e mode) +{ ZSTD_parameters params; ZSTD_compressionParameters const cParams =3D ZSTD_getCParams_internal(= compressionLevel, srcSizeHint, dictSize, mode); DEBUGLOG(5, "ZSTD_getParams (cLevel=3D%i)", compressionLevel); @@ -6123,7 +7601,34 @@ static ZSTD_parameters ZSTD_getParams_internal(int c= ompressionLevel, unsigned lo * same idea as ZSTD_getCParams() * @return a `ZSTD_parameters` structure (instead of `ZSTD_compressionPara= meters`). * Fields of `ZSTD_frameParameters` are set to default values */ -ZSTD_parameters ZSTD_getParams(int compressionLevel, unsigned long long sr= cSizeHint, size_t dictSize) { +ZSTD_parameters ZSTD_getParams(int compressionLevel, unsigned long long sr= cSizeHint, size_t dictSize) +{ if (srcSizeHint =3D=3D 0) srcSizeHint =3D ZSTD_CONTENTSIZE_UNKNOWN; return ZSTD_getParams_internal(compressionLevel, srcSizeHint, dictSize= , ZSTD_cpm_unknown); } + +void ZSTD_registerSequenceProducer( + ZSTD_CCtx* zc, + void* extSeqProdState, + ZSTD_sequenceProducer_F extSeqProdFunc) +{ + assert(zc !=3D NULL); + ZSTD_CCtxParams_registerSequenceProducer( + &zc->requestedParams, extSeqProdState, extSeqProdFunc + ); +} + +void ZSTD_CCtxParams_registerSequenceProducer( + ZSTD_CCtx_params* params, + void* extSeqProdState, + ZSTD_sequenceProducer_F extSeqProdFunc) +{ + assert(params !=3D NULL); + if (extSeqProdFunc !=3D NULL) { + params->extSeqProdFunc =3D extSeqProdFunc; + params->extSeqProdState =3D extSeqProdState; + } else { + params->extSeqProdFunc =3D NULL; + params->extSeqProdState =3D NULL; + } +} diff --git a/lib/zstd/compress/zstd_compress_internal.h b/lib/zstd/compress= /zstd_compress_internal.h index 71697a11ae30..b10978385876 100644 --- a/lib/zstd/compress/zstd_compress_internal.h +++ b/lib/zstd/compress/zstd_compress_internal.h @@ -1,5 +1,6 @@ +/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */ /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -20,7 +21,8 @@ ***************************************/ #include "../common/zstd_internal.h" #include "zstd_cwksp.h" - +#include "../common/bits.h" /* ZSTD_highbit32, ZSTD_NbCommonBytes */ +#include "zstd_preSplit.h" /* ZSTD_SLIPBLOCK_WORKSPACESIZE */ =20 /*-************************************* * Constants @@ -32,7 +34,7 @@ It's not a big deal though : candid= ate will just be sorted again. Additionally, candidate position 1 = will be lost. But candidate 1 cannot hide a large= tree of candidates, so it's a minimal loss. - The benefit is that ZSTD_DUBT_UNSOR= TED_MARK cannot be mishandled after table re-use with a different strategy. + The benefit is that ZSTD_DUBT_UNSOR= TED_MARK cannot be mishandled after table reuse with a different strategy. This constant is required by ZSTD_c= ompressBlock_btlazy2() and ZSTD_reduceTable_internal() */ =20 =20 @@ -75,6 +77,70 @@ typedef struct { ZSTD_fseCTables_t fse; } ZSTD_entropyCTables_t; =20 +/* ********************************************* +* Sequences * +***********************************************/ +typedef struct SeqDef_s { + U32 offBase; /* offBase =3D=3D Offset + ZSTD_REP_NUM, or repcode 1,2= ,3 */ + U16 litLength; + U16 mlBase; /* mlBase =3D=3D matchLength - MINMATCH */ +} SeqDef; + +/* Controls whether seqStore has a single "long" litLength or matchLength.= See SeqStore_t. */ +typedef enum { + ZSTD_llt_none =3D 0, /* no longLengthType */ + ZSTD_llt_literalLength =3D 1, /* represents a long literal */ + ZSTD_llt_matchLength =3D 2 /* represents a long match */ +} ZSTD_longLengthType_e; + +typedef struct { + SeqDef* sequencesStart; + SeqDef* sequences; /* ptr to end of sequences */ + BYTE* litStart; + BYTE* lit; /* ptr to end of literals */ + BYTE* llCode; + BYTE* mlCode; + BYTE* ofCode; + size_t maxNbSeq; + size_t maxNbLit; + + /* longLengthPos and longLengthType to allow us to represent either a = single litLength or matchLength + * in the seqStore that has a value larger than U16 (if it exists). To= do so, we increment + * the existing value of the litLength or matchLength by 0x10000. + */ + ZSTD_longLengthType_e longLengthType; + U32 longLengthPos; /* Index of the sequence to appl= y long length modification to */ +} SeqStore_t; + +typedef struct { + U32 litLength; + U32 matchLength; +} ZSTD_SequenceLength; + +/* + * Returns the ZSTD_SequenceLength for the given sequences. It handles the= decoding of long sequences + * indicated by longLengthPos and longLengthType, and adds MINMATCH back t= o matchLength. + */ +MEM_STATIC ZSTD_SequenceLength ZSTD_getSequenceLength(SeqStore_t const* se= qStore, SeqDef const* seq) +{ + ZSTD_SequenceLength seqLen; + seqLen.litLength =3D seq->litLength; + seqLen.matchLength =3D seq->mlBase + MINMATCH; + if (seqStore->longLengthPos =3D=3D (U32)(seq - seqStore->sequencesStar= t)) { + if (seqStore->longLengthType =3D=3D ZSTD_llt_literalLength) { + seqLen.litLength +=3D 0x10000; + } + if (seqStore->longLengthType =3D=3D ZSTD_llt_matchLength) { + seqLen.matchLength +=3D 0x10000; + } + } + return seqLen; +} + +const SeqStore_t* ZSTD_getSeqStore(const ZSTD_CCtx* ctx); /* compress & = dictBuilder */ +int ZSTD_seqToCodes(const SeqStore_t* seqStorePtr); /* compress, dictBui= lder, decodeCorpus (shouldn't get its definition from here) */ + + /* ********************************************* * Entropy buffer statistics structs and funcs * ***********************************************/ @@ -84,7 +150,7 @@ typedef struct { * hufDesSize refers to the size of huffman tree description in bytes. * This metadata is populated in ZSTD_buildBlockEntropyStats_literals() */ typedef struct { - symbolEncodingType_e hType; + SymbolEncodingType_e hType; BYTE hufDesBuffer[ZSTD_MAX_HUF_HEADER_SIZE]; size_t hufDesSize; } ZSTD_hufCTablesMetadata_t; @@ -95,9 +161,9 @@ typedef struct { * fseTablesSize refers to the size of fse tables in bytes. * This metadata is populated in ZSTD_buildBlockEntropyStats_sequences() = */ typedef struct { - symbolEncodingType_e llType; - symbolEncodingType_e ofType; - symbolEncodingType_e mlType; + SymbolEncodingType_e llType; + SymbolEncodingType_e ofType; + SymbolEncodingType_e mlType; BYTE fseTablesBuffer[ZSTD_MAX_FSE_HEADERS_SIZE]; size_t fseTablesSize; size_t lastCountSize; /* This is to account for bug in 1.3.4. More det= ail in ZSTD_entropyCompressSeqStore_internal() */ @@ -111,12 +177,13 @@ typedef struct { /* ZSTD_buildBlockEntropyStats() : * Builds entropy for the block. * @return : 0 on success or error code */ -size_t ZSTD_buildBlockEntropyStats(seqStore_t* seqStorePtr, - const ZSTD_entropyCTables_t* prevEntropy, - ZSTD_entropyCTables_t* nextEntropy, - const ZSTD_CCtx_params* cctxParams, - ZSTD_entropyCTablesMetadata_t* entropyM= etadata, - void* workspace, size_t wkspSize); +size_t ZSTD_buildBlockEntropyStats( + const SeqStore_t* seqStorePtr, + const ZSTD_entropyCTables_t* prevEntropy, + ZSTD_entropyCTables_t* nextEntropy, + const ZSTD_CCtx_params* cctxParams, + ZSTD_entropyCTablesMetadata_t* entropyMetadata, + void* workspace, size_t wkspSize); =20 /* ******************************* * Compression internals structs * @@ -140,28 +207,29 @@ typedef struct { stopped. posInSequence <=3D seq[pos].litLength = + seq[pos].matchLength */ size_t size; /* The number of sequences. <=3D capacity. */ size_t capacity; /* The capacity starting from `seq` pointer */ -} rawSeqStore_t; +} RawSeqStore_t; =20 -UNUSED_ATTR static const rawSeqStore_t kNullRawSeqStore =3D {NULL, 0, 0, 0= , 0}; +UNUSED_ATTR static const RawSeqStore_t kNullRawSeqStore =3D {NULL, 0, 0, 0= , 0}; =20 typedef struct { - int price; - U32 off; - U32 mlen; - U32 litlen; - U32 rep[ZSTD_REP_NUM]; + int price; /* price from beginning of segment to this position */ + U32 off; /* offset of previous match */ + U32 mlen; /* length of previous match */ + U32 litlen; /* nb of literals since previous match */ + U32 rep[ZSTD_REP_NUM]; /* offset history after previous match */ } ZSTD_optimal_t; =20 typedef enum { zop_dynamic=3D0, zop_predef } ZSTD_OptPrice_e; =20 +#define ZSTD_OPT_SIZE (ZSTD_OPT_NUM+3) typedef struct { /* All tables are allocated inside cctx->workspace by ZSTD_resetCCtx_i= nternal() */ unsigned* litFreq; /* table of literals statistics, of size = 256 */ unsigned* litLengthFreq; /* table of litLength statistics, of size= (MaxLL+1) */ unsigned* matchLengthFreq; /* table of matchLength statistics, of si= ze (MaxML+1) */ unsigned* offCodeFreq; /* table of offCode statistics, of size (= MaxOff+1) */ - ZSTD_match_t* matchTable; /* list of found matches, of size ZSTD_OP= T_NUM+1 */ - ZSTD_optimal_t* priceTable; /* All positions tracked by optimal parse= r, of size ZSTD_OPT_NUM+1 */ + ZSTD_match_t* matchTable; /* list of found matches, of size ZSTD_OP= T_SIZE */ + ZSTD_optimal_t* priceTable; /* All positions tracked by optimal parse= r, of size ZSTD_OPT_SIZE */ =20 U32 litSum; /* nb of literals */ U32 litLengthSum; /* nb of litLength codes */ @@ -173,7 +241,7 @@ typedef struct { U32 offCodeSumBasePrice; /* to compare to log2(offreq) */ ZSTD_OptPrice_e priceType; /* prices can be determined dynamically, = or follow a pre-defined cost structure */ const ZSTD_entropyCTables_t* symbolCosts; /* pre-calculated dictionar= y statistics */ - ZSTD_paramSwitch_e literalCompressionMode; + ZSTD_ParamSwitch_e literalCompressionMode; } optState_t; =20 typedef struct { @@ -195,11 +263,11 @@ typedef struct { =20 #define ZSTD_WINDOW_START_INDEX 2 =20 -typedef struct ZSTD_matchState_t ZSTD_matchState_t; +typedef struct ZSTD_MatchState_t ZSTD_MatchState_t; =20 #define ZSTD_ROW_HASH_CACHE_SIZE 8 /* Size of prefetching hash cache= for row-based matchfinder */ =20 -struct ZSTD_matchState_t { +struct ZSTD_MatchState_t { ZSTD_window_t window; /* State for window round buffer management */ U32 loadedDictEnd; /* index of end of dictionary, within context'= s referential. * When loadedDictEnd !=3D 0, a dictionary is = in use, and still valid. @@ -212,28 +280,42 @@ struct ZSTD_matchState_t { U32 hashLog3; /* dispatch table for matches of len=3D=3D3 : = larger =3D=3D faster, more memory */ =20 U32 rowHashLog; /* For row-based matchfinder:= Hashlog based on nb of rows in the hashTable.*/ - U16* tagTable; /* For row-based matchFinder:= A row-based table containing the hashes and head index. */ + BYTE* tagTable; /* For row-based matchFinder:= A row-based table containing the hashes and head index. */ U32 hashCache[ZSTD_ROW_HASH_CACHE_SIZE]; /* For row-based matchFinder:= a cache of hashes to improve speed */ + U64 hashSalt; /* For row-based matchFinder:= salts the hash for reuse of tag table */ + U32 hashSaltEntropy; /* For row-based matchFinder:= collects entropy for salt generation */ =20 U32* hashTable; U32* hashTable3; U32* chainTable; =20 - U32 forceNonContiguous; /* Non-zero if we should force non-contiguous = load for the next window update. */ + int forceNonContiguous; /* Non-zero if we should force non-contiguous = load for the next window update. */ =20 int dedicatedDictSearch; /* Indicates whether this matchState is usin= g the * dedicated dictionary search structure. */ optState_t opt; /* optimal parser state */ - const ZSTD_matchState_t* dictMatchState; + const ZSTD_MatchState_t* dictMatchState; ZSTD_compressionParameters cParams; - const rawSeqStore_t* ldmSeqStore; + const RawSeqStore_t* ldmSeqStore; + + /* Controls prefetching in some dictMatchState matchfinders. + * This behavior is controlled from the cctx ms. + * This parameter has no effect in the cdict ms. */ + int prefetchCDictTables; + + /* When =3D=3D 0, lazy match finders insert every position. + * When !=3D 0, lazy match finders only insert positions they search. + * This allows them to skip much faster over incompressible data, + * at a small cost to compression ratio. + */ + int lazySkipping; }; =20 typedef struct { ZSTD_compressedBlockState_t* prevCBlock; ZSTD_compressedBlockState_t* nextCBlock; - ZSTD_matchState_t matchState; + ZSTD_MatchState_t matchState; } ZSTD_blockState_t; =20 typedef struct { @@ -260,7 +342,7 @@ typedef struct { } ldmState_t; =20 typedef struct { - ZSTD_paramSwitch_e enableLdm; /* ZSTD_ps_enable to enable LDM. ZSTD_ps= _auto by default */ + ZSTD_ParamSwitch_e enableLdm; /* ZSTD_ps_enable to enable LDM. ZSTD_ps= _auto by default */ U32 hashLog; /* Log size of hashTable */ U32 bucketSizeLog; /* Log bucket size for collision resolution, a= t most 8 */ U32 minMatchLength; /* Minimum match length */ @@ -291,7 +373,7 @@ struct ZSTD_CCtx_params_s { * There is no guarantee that hint is close= to actual source size */ =20 ZSTD_dictAttachPref_e attachDictPref; - ZSTD_paramSwitch_e literalCompressionMode; + ZSTD_ParamSwitch_e literalCompressionMode; =20 /* Multithreading: used to pass parameters to mtctx */ int nbWorkers; @@ -310,24 +392,54 @@ struct ZSTD_CCtx_params_s { ZSTD_bufferMode_e outBufferMode; =20 /* Sequence compression API */ - ZSTD_sequenceFormat_e blockDelimiters; + ZSTD_SequenceFormat_e blockDelimiters; int validateSequences; =20 - /* Block splitting */ - ZSTD_paramSwitch_e useBlockSplitter; + /* Block splitting + * @postBlockSplitter executes split analysis after sequences are prod= uced, + * it's more accurate but consumes more resources. + * @preBlockSplitter_level splits before knowing sequences, + * it's more approximative but also cheaper. + * Valid @preBlockSplitter_level values range from 0 to 6 (included). + * 0 means auto, 1 means do not split, + * then levels are sorted in increasing cpu budget, from 2 (fastest) t= o 6 (slowest). + * Highest @preBlockSplitter_level combines well with @postBlockSplitt= er. + */ + ZSTD_ParamSwitch_e postBlockSplitter; + int preBlockSplitter_level; + + /* Adjust the max block size*/ + size_t maxBlockSize; =20 /* Param for deciding whether to use row-based matchfinder */ - ZSTD_paramSwitch_e useRowMatchFinder; + ZSTD_ParamSwitch_e useRowMatchFinder; =20 /* Always load a dictionary in ext-dict mode (not prefix mode)? */ int deterministicRefPrefix; =20 /* Internal use, for createCCtxParams() and freeCCtxParams() only */ ZSTD_customMem customMem; + + /* Controls prefetching in some dictMatchState matchfinders */ + ZSTD_ParamSwitch_e prefetchCDictTables; + + /* Controls whether zstd will fall back to an internal matchfinder + * if the external matchfinder returns an error code. */ + int enableMatchFinderFallback; + + /* Parameters for the external sequence producer API. + * Users set these parameters through ZSTD_registerSequenceProducer(). + * It is not possible to set these parameters individually through the= public API. */ + void* extSeqProdState; + ZSTD_sequenceProducer_F extSeqProdFunc; + + /* Controls repcode search in external sequence parsing */ + ZSTD_ParamSwitch_e searchForExternalRepcodes; }; /* typedef'd to ZSTD_CCtx_params within "zstd.h" */ =20 #define COMPRESS_SEQUENCES_WORKSPACE_SIZE (sizeof(unsigned) * (MaxSeq + 2)) #define ENTROPY_WORKSPACE_SIZE (HUF_WORKSPACE_SIZE + COMPRESS_SEQUENCES_WO= RKSPACE_SIZE) +#define TMP_WORKSPACE_SIZE (MAX(ENTROPY_WORKSPACE_SIZE, ZSTD_SLIPBLOCK_WOR= KSPACESIZE)) =20 /* * Indicates whether this compression proceeds directly from user-provided @@ -345,11 +457,11 @@ typedef enum { */ #define ZSTD_MAX_NB_BLOCK_SPLITS 196 typedef struct { - seqStore_t fullSeqStoreChunk; - seqStore_t firstHalfSeqStore; - seqStore_t secondHalfSeqStore; - seqStore_t currSeqStore; - seqStore_t nextSeqStore; + SeqStore_t fullSeqStoreChunk; + SeqStore_t firstHalfSeqStore; + SeqStore_t secondHalfSeqStore; + SeqStore_t currSeqStore; + SeqStore_t nextSeqStore; =20 U32 partitions[ZSTD_MAX_NB_BLOCK_SPLITS]; ZSTD_entropyCTablesMetadata_t entropyMetadata; @@ -366,7 +478,7 @@ struct ZSTD_CCtx_s { size_t dictContentSize; =20 ZSTD_cwksp workspace; /* manages buffer for dynamic allocations */ - size_t blockSize; + size_t blockSizeMax; unsigned long long pledgedSrcSizePlusOne; /* this way, 0 (default) = =3D=3D unknown */ unsigned long long consumedSrcSize; unsigned long long producedCSize; @@ -378,13 +490,14 @@ struct ZSTD_CCtx_s { int isFirstBlock; int initialized; =20 - seqStore_t seqStore; /* sequences storage ptrs */ + SeqStore_t seqStore; /* sequences storage ptrs */ ldmState_t ldmState; /* long distance matching state */ rawSeq* ldmSequences; /* Storage for the ldm output sequences */ size_t maxNbLdmSequences; - rawSeqStore_t externSeqStore; /* Mutable reference to external sequenc= es */ + RawSeqStore_t externSeqStore; /* Mutable reference to external sequenc= es */ ZSTD_blockState_t blockState; - U32* entropyWorkspace; /* entropy workspace of ENTROPY_WORKSPACE_SIZE= bytes */ + void* tmpWorkspace; /* used as substitute of stack space - must be al= igned for S64 type */ + size_t tmpWkspSize; =20 /* Whether we are streaming or not */ ZSTD_buffered_policy_e bufferedPolicy; @@ -404,6 +517,7 @@ struct ZSTD_CCtx_s { =20 /* Stable in/out buffer verification */ ZSTD_inBuffer expectedInBuffer; + size_t stableIn_notConsumed; /* nb bytes within stable input buffer th= at are said to be consumed but are not */ size_t expectedOutBufferSize; =20 /* Dictionary */ @@ -417,9 +531,14 @@ struct ZSTD_CCtx_s { =20 /* Workspace for block splitter */ ZSTD_blockSplitCtx blockSplitCtx; + + /* Buffer for output from external sequence producer */ + ZSTD_Sequence* extSeqBuf; + size_t extSeqBufCapacity; }; =20 typedef enum { ZSTD_dtlm_fast, ZSTD_dtlm_full } ZSTD_dictTableLoadMethod_e; +typedef enum { ZSTD_tfp_forCCtx, ZSTD_tfp_forCDict } ZSTD_tableFillPurpose= _e; =20 typedef enum { ZSTD_noDict =3D 0, @@ -441,17 +560,17 @@ typedef enum { * In this mode we take both the source si= ze and the dictionary size * into account when selecting and adjusti= ng the parameters. */ - ZSTD_cpm_unknown =3D 3, /* ZSTD_getCParams, ZSTD_getParams, ZSTD= _adjustParams. + ZSTD_cpm_unknown =3D 3 /* ZSTD_getCParams, ZSTD_getParams, ZSTD= _adjustParams. * We don't know what these parameters are= for. We default to the legacy * behavior of taking both the source size= and the dict size into account * when selecting and adjusting parameters. */ -} ZSTD_cParamMode_e; +} ZSTD_CParamMode_e; =20 -typedef size_t (*ZSTD_blockCompressor) ( - ZSTD_matchState_t* bs, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +typedef size_t (*ZSTD_BlockCompressor_f) ( + ZSTD_MatchState_t* bs, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); -ZSTD_blockCompressor ZSTD_selectBlockCompressor(ZSTD_strategy strat, ZSTD_= paramSwitch_e rowMatchfinderMode, ZSTD_dictMode_e dictMode); +ZSTD_BlockCompressor_f ZSTD_selectBlockCompressor(ZSTD_strategy strat, ZST= D_ParamSwitch_e rowMatchfinderMode, ZSTD_dictMode_e dictMode); =20 =20 MEM_STATIC U32 ZSTD_LLcode(U32 litLength) @@ -497,12 +616,33 @@ MEM_STATIC int ZSTD_cParam_withinBounds(ZSTD_cParamet= er cParam, int value) return 1; } =20 +/* ZSTD_selectAddr: + * @return index >=3D lowLimit ? candidate : backup, + * tries to force branchless codegen. */ +MEM_STATIC const BYTE* +ZSTD_selectAddr(U32 index, U32 lowLimit, const BYTE* candidate, const BYTE= * backup) +{ +#if defined(__x86_64__) + __asm__ ( + "cmp %1, %2\n" + "cmova %3, %0\n" + : "+r"(candidate) + : "r"(index), "r"(lowLimit), "r"(backup) + ); + return candidate; +#else + return index >=3D lowLimit ? candidate : backup; +#endif +} + /* ZSTD_noCompressBlock() : * Writes uncompressed block to dst buffer from given src. * Returns the size of the block */ -MEM_STATIC size_t ZSTD_noCompressBlock (void* dst, size_t dstCapacity, con= st void* src, size_t srcSize, U32 lastBlock) +MEM_STATIC size_t +ZSTD_noCompressBlock(void* dst, size_t dstCapacity, const void* src, size_= t srcSize, U32 lastBlock) { U32 const cBlockHeader24 =3D lastBlock + (((U32)bt_raw)<<1) + (U32)(sr= cSize << 3); + DEBUGLOG(5, "ZSTD_noCompressBlock (srcSize=3D%zu, dstCapacity=3D%zu)",= srcSize, dstCapacity); RETURN_ERROR_IF(srcSize + ZSTD_blockHeaderSize > dstCapacity, dstSize_tooSmall, "dst buf too small for uncompressed = block"); MEM_writeLE24(dst, cBlockHeader24); @@ -510,7 +650,8 @@ MEM_STATIC size_t ZSTD_noCompressBlock (void* dst, size= _t dstCapacity, const voi return ZSTD_blockHeaderSize + srcSize; } =20 -MEM_STATIC size_t ZSTD_rleCompressBlock (void* dst, size_t dstCapacity, BY= TE src, size_t srcSize, U32 lastBlock) +MEM_STATIC size_t +ZSTD_rleCompressBlock(void* dst, size_t dstCapacity, BYTE src, size_t srcS= ize, U32 lastBlock) { BYTE* const op =3D (BYTE*)dst; U32 const cBlockHeader =3D lastBlock + (((U32)bt_rle)<<1) + (U32)(srcS= ize << 3); @@ -529,7 +670,7 @@ MEM_STATIC size_t ZSTD_minGain(size_t srcSize, ZSTD_str= ategy strat) { U32 const minlog =3D (strat>=3DZSTD_btultra) ? (U32)(strat) - 1 : 6; ZSTD_STATIC_ASSERT(ZSTD_btultra =3D=3D 8); - assert(ZSTD_cParam_withinBounds(ZSTD_c_strategy, strat)); + assert(ZSTD_cParam_withinBounds(ZSTD_c_strategy, (int)strat)); return (srcSize >> minlog) + 2; } =20 @@ -565,29 +706,68 @@ ZSTD_safecopyLiterals(BYTE* op, BYTE const* ip, BYTE = const* const iend, BYTE con while (ip < iend) *op++ =3D *ip++; } =20 -#define ZSTD_REP_MOVE (ZSTD_REP_NUM-1) -#define STORE_REPCODE_1 STORE_REPCODE(1) -#define STORE_REPCODE_2 STORE_REPCODE(2) -#define STORE_REPCODE_3 STORE_REPCODE(3) -#define STORE_REPCODE(r) (assert((r)>=3D1), assert((r)<=3D3), (r)-1) -#define STORE_OFFSET(o) (assert((o)>0), o + ZSTD_REP_MOVE) -#define STORED_IS_OFFSET(o) ((o) > ZSTD_REP_MOVE) -#define STORED_IS_REPCODE(o) ((o) <=3D ZSTD_REP_MOVE) -#define STORED_OFFSET(o) (assert(STORED_IS_OFFSET(o)), (o)-ZSTD_REP_MOVE) -#define STORED_REPCODE(o) (assert(STORED_IS_REPCODE(o)), (o)+1) /* return= s ID 1,2,3 */ -#define STORED_TO_OFFBASE(o) ((o)+1) -#define OFFBASE_TO_STORED(o) ((o)-1) + +#define REPCODE1_TO_OFFBASE REPCODE_TO_OFFBASE(1) +#define REPCODE2_TO_OFFBASE REPCODE_TO_OFFBASE(2) +#define REPCODE3_TO_OFFBASE REPCODE_TO_OFFBASE(3) +#define REPCODE_TO_OFFBASE(r) (assert((r)>=3D1), assert((r)<=3DZSTD_REP_NU= M), (r)) /* accepts IDs 1,2,3 */ +#define OFFSET_TO_OFFBASE(o) (assert((o)>0), o + ZSTD_REP_NUM) +#define OFFBASE_IS_OFFSET(o) ((o) > ZSTD_REP_NUM) +#define OFFBASE_IS_REPCODE(o) ( 1 <=3D (o) && (o) <=3D ZSTD_REP_NUM) +#define OFFBASE_TO_OFFSET(o) (assert(OFFBASE_IS_OFFSET(o)), (o) - ZSTD_RE= P_NUM) +#define OFFBASE_TO_REPCODE(o) (assert(OFFBASE_IS_REPCODE(o)), (o)) /* ret= urns ID 1,2,3 */ + +/*! ZSTD_storeSeqOnly() : + * Store a sequence (litlen, litPtr, offBase and matchLength) into SeqSto= re_t. + * Literals themselves are not copied, but @litPtr is updated. + * @offBase : Users should employ macros REPCODE_TO_OFFBASE() and OFFSET_= TO_OFFBASE(). + * @matchLength : must be >=3D MINMATCH +*/ +HINT_INLINE UNUSED_ATTR void +ZSTD_storeSeqOnly(SeqStore_t* seqStorePtr, + size_t litLength, + U32 offBase, + size_t matchLength) +{ + assert((size_t)(seqStorePtr->sequences - seqStorePtr->sequencesStart) = < seqStorePtr->maxNbSeq); + + /* literal Length */ + assert(litLength <=3D ZSTD_BLOCKSIZE_MAX); + if (UNLIKELY(litLength>0xFFFF)) { + assert(seqStorePtr->longLengthType =3D=3D ZSTD_llt_none); /* there= can only be a single long length */ + seqStorePtr->longLengthType =3D ZSTD_llt_literalLength; + seqStorePtr->longLengthPos =3D (U32)(seqStorePtr->sequences - seqS= torePtr->sequencesStart); + } + seqStorePtr->sequences[0].litLength =3D (U16)litLength; + + /* match offset */ + seqStorePtr->sequences[0].offBase =3D offBase; + + /* match Length */ + assert(matchLength <=3D ZSTD_BLOCKSIZE_MAX); + assert(matchLength >=3D MINMATCH); + { size_t const mlBase =3D matchLength - MINMATCH; + if (UNLIKELY(mlBase>0xFFFF)) { + assert(seqStorePtr->longLengthType =3D=3D ZSTD_llt_none); /* t= here can only be a single long length */ + seqStorePtr->longLengthType =3D ZSTD_llt_matchLength; + seqStorePtr->longLengthPos =3D (U32)(seqStorePtr->sequences - = seqStorePtr->sequencesStart); + } + seqStorePtr->sequences[0].mlBase =3D (U16)mlBase; + } + + seqStorePtr->sequences++; +} =20 /*! ZSTD_storeSeq() : - * Store a sequence (litlen, litPtr, offCode and matchLength) into seqSto= re_t. - * @offBase_minus1 : Users should use employ macros STORE_REPCODE_X and S= TORE_OFFSET(). + * Store a sequence (litlen, litPtr, offBase and matchLength) into SeqSto= re_t. + * @offBase : Users should employ macros REPCODE_TO_OFFBASE() and OFFSET_= TO_OFFBASE(). * @matchLength : must be >=3D MINMATCH - * Allowed to overread literals up to litLimit. + * Allowed to over-read literals up to litLimit. */ HINT_INLINE UNUSED_ATTR void -ZSTD_storeSeq(seqStore_t* seqStorePtr, +ZSTD_storeSeq(SeqStore_t* seqStorePtr, size_t litLength, const BYTE* literals, const BYTE* litLimit, - U32 offBase_minus1, + U32 offBase, size_t matchLength) { BYTE const* const litLimit_w =3D litLimit - WILDCOPY_OVERLENGTH; @@ -596,8 +776,8 @@ ZSTD_storeSeq(seqStore_t* seqStorePtr, static const BYTE* g_start =3D NULL; if (g_start=3D=3DNULL) g_start =3D (const BYTE*)literals; /* note : i= ndex only works for compression within a single segment */ { U32 const pos =3D (U32)((const BYTE*)literals - g_start); - DEBUGLOG(6, "Cpos%7u :%3u literals, match%4u bytes at offCode%7u", - pos, (U32)litLength, (U32)matchLength, (U32)offBase_minus1); + DEBUGLOG(6, "Cpos%7u :%3u literals, match%4u bytes at offBase%7u", + pos, (U32)litLength, (U32)matchLength, (U32)offBase); } #endif assert((size_t)(seqStorePtr->sequences - seqStorePtr->sequencesStart) = < seqStorePtr->maxNbSeq); @@ -607,9 +787,9 @@ ZSTD_storeSeq(seqStore_t* seqStorePtr, assert(literals + litLength <=3D litLimit); if (litEnd <=3D litLimit_w) { /* Common case we can use wildcopy. - * First copy 16 bytes, because literals are likely short. - */ - assert(WILDCOPY_OVERLENGTH >=3D 16); + * First copy 16 bytes, because literals are likely short. + */ + ZSTD_STATIC_ASSERT(WILDCOPY_OVERLENGTH >=3D 16); ZSTD_copy16(seqStorePtr->lit, literals); if (litLength > 16) { ZSTD_wildcopy(seqStorePtr->lit+16, literals+16, (ptrdiff_t)lit= Length-16, ZSTD_no_overlap); @@ -619,44 +799,22 @@ ZSTD_storeSeq(seqStore_t* seqStorePtr, } seqStorePtr->lit +=3D litLength; =20 - /* literal Length */ - if (litLength>0xFFFF) { - assert(seqStorePtr->longLengthType =3D=3D ZSTD_llt_none); /* there= can only be a single long length */ - seqStorePtr->longLengthType =3D ZSTD_llt_literalLength; - seqStorePtr->longLengthPos =3D (U32)(seqStorePtr->sequences - seqS= torePtr->sequencesStart); - } - seqStorePtr->sequences[0].litLength =3D (U16)litLength; - - /* match offset */ - seqStorePtr->sequences[0].offBase =3D STORED_TO_OFFBASE(offBase_minus1= ); - - /* match Length */ - assert(matchLength >=3D MINMATCH); - { size_t const mlBase =3D matchLength - MINMATCH; - if (mlBase>0xFFFF) { - assert(seqStorePtr->longLengthType =3D=3D ZSTD_llt_none); /* t= here can only be a single long length */ - seqStorePtr->longLengthType =3D ZSTD_llt_matchLength; - seqStorePtr->longLengthPos =3D (U32)(seqStorePtr->sequences - = seqStorePtr->sequencesStart); - } - seqStorePtr->sequences[0].mlBase =3D (U16)mlBase; - } - - seqStorePtr->sequences++; + ZSTD_storeSeqOnly(seqStorePtr, litLength, offBase, matchLength); } =20 /* ZSTD_updateRep() : * updates in-place @rep (array of repeat offsets) - * @offBase_minus1 : sum-type, with same numeric representation as ZSTD_st= oreSeq() + * @offBase : sum-type, using numeric representation of ZSTD_storeSeq() */ MEM_STATIC void -ZSTD_updateRep(U32 rep[ZSTD_REP_NUM], U32 const offBase_minus1, U32 const = ll0) +ZSTD_updateRep(U32 rep[ZSTD_REP_NUM], U32 const offBase, U32 const ll0) { - if (STORED_IS_OFFSET(offBase_minus1)) { /* full offset */ + if (OFFBASE_IS_OFFSET(offBase)) { /* full offset */ rep[2] =3D rep[1]; rep[1] =3D rep[0]; - rep[0] =3D STORED_OFFSET(offBase_minus1); + rep[0] =3D OFFBASE_TO_OFFSET(offBase); } else { /* repcode */ - U32 const repCode =3D STORED_REPCODE(offBase_minus1) - 1 + ll0; + U32 const repCode =3D OFFBASE_TO_REPCODE(offBase) - 1 + ll0; if (repCode > 0) { /* note : if repCode=3D=3D0, no change */ U32 const currentOffset =3D (repCode=3D=3DZSTD_REP_NUM) ? (rep= [0] - 1) : rep[repCode]; rep[2] =3D (repCode >=3D 2) ? rep[1] : rep[2]; @@ -670,14 +828,14 @@ ZSTD_updateRep(U32 rep[ZSTD_REP_NUM], U32 const offBa= se_minus1, U32 const ll0) =20 typedef struct repcodes_s { U32 rep[3]; -} repcodes_t; +} Repcodes_t; =20 -MEM_STATIC repcodes_t -ZSTD_newRep(U32 const rep[ZSTD_REP_NUM], U32 const offBase_minus1, U32 con= st ll0) +MEM_STATIC Repcodes_t +ZSTD_newRep(U32 const rep[ZSTD_REP_NUM], U32 const offBase, U32 const ll0) { - repcodes_t newReps; + Repcodes_t newReps; ZSTD_memcpy(&newReps, rep, sizeof(newReps)); - ZSTD_updateRep(newReps.rep, offBase_minus1, ll0); + ZSTD_updateRep(newReps.rep, offBase, ll0); return newReps; } =20 @@ -685,59 +843,6 @@ ZSTD_newRep(U32 const rep[ZSTD_REP_NUM], U32 const off= Base_minus1, U32 const ll0 /*-************************************* * Match length counter ***************************************/ -static unsigned ZSTD_NbCommonBytes (size_t val) -{ - if (MEM_isLittleEndian()) { - if (MEM_64bits()) { -# if (__GNUC__ >=3D 4) - return (__builtin_ctzll((U64)val) >> 3); -# else - static const int DeBruijnBytePos[64] =3D { 0, 0, 0, 0, 0, 1, 1= , 2, - 0, 3, 1, 3, 1, 4, 2, = 7, - 0, 2, 3, 6, 1, 5, 3, = 5, - 1, 3, 4, 4, 2, 5, 6, = 7, - 7, 0, 1, 2, 3, 3, 4, = 6, - 2, 6, 5, 5, 3, 4, 5, = 6, - 7, 1, 2, 4, 6, 4, 4, = 5, - 7, 2, 6, 5, 7, 6, 7, = 7 }; - return DeBruijnBytePos[((U64)((val & -(long long)val) * 0x0218= A392CDABBD3FULL)) >> 58]; -# endif - } else { /* 32 bits */ -# if (__GNUC__ >=3D 3) - return (__builtin_ctz((U32)val) >> 3); -# else - static const int DeBruijnBytePos[32] =3D { 0, 0, 3, 0, 3, 1, 3= , 0, - 3, 2, 2, 1, 3, 2, 0, = 1, - 3, 3, 1, 2, 2, 2, 2, = 0, - 3, 1, 2, 0, 1, 0, 1, = 1 }; - return DeBruijnBytePos[((U32)((val & -(S32)val) * 0x077CB531U)= ) >> 27]; -# endif - } - } else { /* Big Endian CPU */ - if (MEM_64bits()) { -# if (__GNUC__ >=3D 4) - return (__builtin_clzll(val) >> 3); -# else - unsigned r; - const unsigned n32 =3D sizeof(size_t)*4; /* calculate this w= ay due to compiler complaining in 32-bits mode */ - if (!(val>>n32)) { r=3D4; } else { r=3D0; val>>=3Dn32; } - if (!(val>>16)) { r+=3D2; val>>=3D8; } else { val>>=3D24; } - r +=3D (!val); - return r; -# endif - } else { /* 32 bits */ -# if (__GNUC__ >=3D 3) - return (__builtin_clz((U32)val) >> 3); -# else - unsigned r; - if (!(val>>16)) { r=3D2; val>>=3D8; } else { r=3D0; val>>=3D24= ; } - r +=3D (!val); - return r; -# endif - } } -} - - MEM_STATIC size_t ZSTD_count(const BYTE* pIn, const BYTE* pMatch, const BY= TE* const pInLimit) { const BYTE* const pStart =3D pIn; @@ -771,8 +876,8 @@ ZSTD_count_2segments(const BYTE* ip, const BYTE* match, size_t const matchLength =3D ZSTD_count(ip, match, vEnd); if (match + matchLength !=3D mEnd) return matchLength; DEBUGLOG(7, "ZSTD_count_2segments: found a 2-parts match (current leng= th=3D=3D%zu)", matchLength); - DEBUGLOG(7, "distance from match beginning to end dictionary =3D %zi",= mEnd - match); - DEBUGLOG(7, "distance from current pos to end buffer =3D %zi", iEnd - = ip); + DEBUGLOG(7, "distance from match beginning to end dictionary =3D %i", = (int)(mEnd - match)); + DEBUGLOG(7, "distance from current pos to end buffer =3D %i", (int)(iE= nd - ip)); DEBUGLOG(7, "next byte : ip=3D=3D%02X, istart=3D=3D%02X", ip[matchLeng= th], *iStart); DEBUGLOG(7, "final match length =3D %zu", matchLength + ZSTD_count(ip+= matchLength, iStart, iEnd)); return matchLength + ZSTD_count(ip+matchLength, iStart, iEnd); @@ -783,32 +888,43 @@ ZSTD_count_2segments(const BYTE* ip, const BYTE* matc= h, * Hashes ***************************************/ static const U32 prime3bytes =3D 506832829U; -static U32 ZSTD_hash3(U32 u, U32 h) { return ((u << (32-24)) * prime3by= tes) >> (32-h) ; } -MEM_STATIC size_t ZSTD_hash3Ptr(const void* ptr, U32 h) { return ZSTD_hash= 3(MEM_readLE32(ptr), h); } /* only in zstd_opt.h */ +static U32 ZSTD_hash3(U32 u, U32 h, U32 s) { assert(h <=3D 32); return = (((u << (32-24)) * prime3bytes) ^ s) >> (32-h) ; } +MEM_STATIC size_t ZSTD_hash3Ptr(const void* ptr, U32 h) { return ZSTD_hash= 3(MEM_readLE32(ptr), h, 0); } /* only in zstd_opt.h */ +MEM_STATIC size_t ZSTD_hash3PtrS(const void* ptr, U32 h, U32 s) { return Z= STD_hash3(MEM_readLE32(ptr), h, s); } =20 static const U32 prime4bytes =3D 2654435761U; -static U32 ZSTD_hash4(U32 u, U32 h) { return (u * prime4bytes) >> (32-h= ) ; } -static size_t ZSTD_hash4Ptr(const void* ptr, U32 h) { return ZSTD_hash4(ME= M_read32(ptr), h); } +static U32 ZSTD_hash4(U32 u, U32 h, U32 s) { assert(h <=3D 32); return = ((u * prime4bytes) ^ s) >> (32-h) ; } +static size_t ZSTD_hash4Ptr(const void* ptr, U32 h) { return ZSTD_hash4(ME= M_readLE32(ptr), h, 0); } +static size_t ZSTD_hash4PtrS(const void* ptr, U32 h, U32 s) { return ZSTD_= hash4(MEM_readLE32(ptr), h, s); } =20 static const U64 prime5bytes =3D 889523592379ULL; -static size_t ZSTD_hash5(U64 u, U32 h) { return (size_t)(((u << (64-40)) = * prime5bytes) >> (64-h)) ; } -static size_t ZSTD_hash5Ptr(const void* p, U32 h) { return ZSTD_hash5(MEM_= readLE64(p), h); } +static size_t ZSTD_hash5(U64 u, U32 h, U64 s) { assert(h <=3D 64); return = (size_t)((((u << (64-40)) * prime5bytes) ^ s) >> (64-h)) ; } +static size_t ZSTD_hash5Ptr(const void* p, U32 h) { return ZSTD_hash5(MEM_= readLE64(p), h, 0); } +static size_t ZSTD_hash5PtrS(const void* p, U32 h, U64 s) { return ZSTD_ha= sh5(MEM_readLE64(p), h, s); } =20 static const U64 prime6bytes =3D 227718039650203ULL; -static size_t ZSTD_hash6(U64 u, U32 h) { return (size_t)(((u << (64-48)) = * prime6bytes) >> (64-h)) ; } -static size_t ZSTD_hash6Ptr(const void* p, U32 h) { return ZSTD_hash6(MEM_= readLE64(p), h); } +static size_t ZSTD_hash6(U64 u, U32 h, U64 s) { assert(h <=3D 64); return = (size_t)((((u << (64-48)) * prime6bytes) ^ s) >> (64-h)) ; } +static size_t ZSTD_hash6Ptr(const void* p, U32 h) { return ZSTD_hash6(MEM_= readLE64(p), h, 0); } +static size_t ZSTD_hash6PtrS(const void* p, U32 h, U64 s) { return ZSTD_ha= sh6(MEM_readLE64(p), h, s); } =20 static const U64 prime7bytes =3D 58295818150454627ULL; -static size_t ZSTD_hash7(U64 u, U32 h) { return (size_t)(((u << (64-56)) = * prime7bytes) >> (64-h)) ; } -static size_t ZSTD_hash7Ptr(const void* p, U32 h) { return ZSTD_hash7(MEM_= readLE64(p), h); } +static size_t ZSTD_hash7(U64 u, U32 h, U64 s) { assert(h <=3D 64); return = (size_t)((((u << (64-56)) * prime7bytes) ^ s) >> (64-h)) ; } +static size_t ZSTD_hash7Ptr(const void* p, U32 h) { return ZSTD_hash7(MEM_= readLE64(p), h, 0); } +static size_t ZSTD_hash7PtrS(const void* p, U32 h, U64 s) { return ZSTD_ha= sh7(MEM_readLE64(p), h, s); } =20 static const U64 prime8bytes =3D 0xCF1BBCDCB7A56463ULL; -static size_t ZSTD_hash8(U64 u, U32 h) { return (size_t)(((u) * prime8byte= s) >> (64-h)) ; } -static size_t ZSTD_hash8Ptr(const void* p, U32 h) { return ZSTD_hash8(MEM_= readLE64(p), h); } +static size_t ZSTD_hash8(U64 u, U32 h, U64 s) { assert(h <=3D 64); return = (size_t)((((u) * prime8bytes) ^ s) >> (64-h)) ; } +static size_t ZSTD_hash8Ptr(const void* p, U32 h) { return ZSTD_hash8(MEM_= readLE64(p), h, 0); } +static size_t ZSTD_hash8PtrS(const void* p, U32 h, U64 s) { return ZSTD_ha= sh8(MEM_readLE64(p), h, s); } + =20 MEM_STATIC FORCE_INLINE_ATTR size_t ZSTD_hashPtr(const void* p, U32 hBits, U32 mls) { + /* Although some of these hashes do support hBits up to 64, some do no= t. + * To be on the safe side, always avoid hBits > 32. */ + assert(hBits <=3D 32); + switch(mls) { default: @@ -820,6 +936,24 @@ size_t ZSTD_hashPtr(const void* p, U32 hBits, U32 mls) } } =20 +MEM_STATIC FORCE_INLINE_ATTR +size_t ZSTD_hashPtrSalted(const void* p, U32 hBits, U32 mls, const U64 has= hSalt) { + /* Although some of these hashes do support hBits up to 64, some do no= t. + * To be on the safe side, always avoid hBits > 32. */ + assert(hBits <=3D 32); + + switch(mls) + { + default: + case 4: return ZSTD_hash4PtrS(p, hBits, (U32)hashSalt); + case 5: return ZSTD_hash5PtrS(p, hBits, hashSalt); + case 6: return ZSTD_hash6PtrS(p, hBits, hashSalt); + case 7: return ZSTD_hash7PtrS(p, hBits, hashSalt); + case 8: return ZSTD_hash8PtrS(p, hBits, hashSalt); + } +} + + /* ZSTD_ipow() : * Return base^exponent. */ @@ -881,11 +1015,12 @@ MEM_STATIC U64 ZSTD_rollingHash_rotate(U64 hash, BYT= E toRemove, BYTE toAdd, U64 /*-************************************* * Round buffer management ***************************************/ -#if (ZSTD_WINDOWLOG_MAX_64 > 31) -# error "ZSTD_WINDOWLOG_MAX is too large : would overflow ZSTD_CURRENT_MAX" -#endif -/* Max current allowed */ -#define ZSTD_CURRENT_MAX ((3U << 29) + (1U << ZSTD_WINDOWLOG_MAX)) +/* Max @current value allowed: + * In 32-bit mode: we want to avoid crossing the 2 GB limit, + * reducing risks of side effects in case of signed operat= ions on indexes. + * In 64-bit mode: we want to ensure that adding the maximum job size (512= MB) + * doesn't overflow U32 index capacity (4 GB) */ +#define ZSTD_CURRENT_MAX (MEM_64bits() ? 3500U MB : 2000U MB) /* Maximum chunk size before overflow correction needs to be called again = */ #define ZSTD_CHUNKSIZE_MAX = \ ( ((U32)-1) /* Maximum ending current index */ = \ @@ -925,7 +1060,7 @@ MEM_STATIC U32 ZSTD_window_hasExtDict(ZSTD_window_t co= nst window) * Inspects the provided matchState and figures out what dictMode should be * passed to the compressor. */ -MEM_STATIC ZSTD_dictMode_e ZSTD_matchState_dictMode(const ZSTD_matchState_= t *ms) +MEM_STATIC ZSTD_dictMode_e ZSTD_matchState_dictMode(const ZSTD_MatchState_= t *ms) { return ZSTD_window_hasExtDict(ms->window) ? ZSTD_extDict : @@ -1011,7 +1146,9 @@ MEM_STATIC U32 ZSTD_window_needOverflowCorrection(ZST= D_window_t const window, * The least significant cycleLog bits of the indices must remain the same, * which may be 0. Every index up to maxDist in the past must be valid. */ -MEM_STATIC U32 ZSTD_window_correctOverflow(ZSTD_window_t* window, U32 cycl= eLog, +MEM_STATIC +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR +U32 ZSTD_window_correctOverflow(ZSTD_window_t* window, U32 cycleLog, U32 maxDist, void const* src) { /* preemptive overflow correction: @@ -1112,7 +1249,7 @@ ZSTD_window_enforceMaxDist(ZSTD_window_t* window, const void* blockEnd, U32 maxDist, U32* loadedDictEndPtr, - const ZSTD_matchState_t** dictMatchStatePtr) + const ZSTD_MatchState_t** dictMatchStatePtr) { U32 const blockEndIdx =3D (U32)((BYTE const*)blockEnd - window->base); U32 const loadedDictEnd =3D (loadedDictEndPtr !=3D NULL) ? *loadedDict= EndPtr : 0; @@ -1157,7 +1294,7 @@ ZSTD_checkDictValidity(const ZSTD_window_t* window, const void* blockEnd, U32 maxDist, U32* loadedDictEndPtr, - const ZSTD_matchState_t** dictMatchStatePtr) + const ZSTD_MatchState_t** dictMatchStatePtr) { assert(loadedDictEndPtr !=3D NULL); assert(dictMatchStatePtr !=3D NULL); @@ -1167,10 +1304,15 @@ ZSTD_checkDictValidity(const ZSTD_window_t* window, (unsigned)blockEndIdx, (unsigned)maxDist, (unsigned)lo= adedDictEnd); assert(blockEndIdx >=3D loadedDictEnd); =20 - if (blockEndIdx > loadedDictEnd + maxDist) { + if (blockEndIdx > loadedDictEnd + maxDist || loadedDictEnd !=3D wi= ndow->dictLimit) { /* On reaching window size, dictionaries are invalidated. * For simplification, if window size is reached anywhere with= in next block, * the dictionary is invalidated for the full block. + * + * We also have to invalidate the dictionary if ZSTD_window_up= date() has detected + * non-contiguous segments, which means that loadedDictEnd != =3D window->dictLimit. + * loadedDictEnd may be 0, if forceWindow is true, but in that= case we never use + * dictMatchState, so setting it to NULL is not a problem. */ DEBUGLOG(6, "invalidating dictionary for current block (distan= ce > windowSize)"); *loadedDictEndPtr =3D 0; @@ -1199,9 +1341,11 @@ MEM_STATIC void ZSTD_window_init(ZSTD_window_t* wind= ow) { * forget about the extDict. Handles overlap of the prefix and extDict. * Returns non-zero if the segment is contiguous. */ -MEM_STATIC U32 ZSTD_window_update(ZSTD_window_t* window, - void const* src, size_t srcSize, - int forceNonContiguous) +MEM_STATIC +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR +U32 ZSTD_window_update(ZSTD_window_t* window, + const void* src, size_t srcSize, + int forceNonContiguous) { BYTE const* const ip =3D (BYTE const*)src; U32 contiguous =3D 1; @@ -1228,8 +1372,9 @@ MEM_STATIC U32 ZSTD_window_update(ZSTD_window_t* wind= ow, /* if input and dictionary overlap : reduce dictionary (area presumed = modified by input) */ if ( (ip+srcSize > window->dictBase + window->lowLimit) & (ip < window->dictBase + window->dictLimit)) { - ptrdiff_t const highInputIdx =3D (ip + srcSize) - window->dictBase; - U32 const lowLimitMax =3D (highInputIdx > (ptrdiff_t)window->dictL= imit) ? window->dictLimit : (U32)highInputIdx; + size_t const highInputIdx =3D (size_t)((ip + srcSize) - window->di= ctBase); + U32 const lowLimitMax =3D (highInputIdx > (size_t)window->dictLimi= t) ? window->dictLimit : (U32)highInputIdx; + assert(highInputIdx < UINT_MAX); window->lowLimit =3D lowLimitMax; DEBUGLOG(5, "Overlapping extDict and input : new lowLimit =3D %u",= window->lowLimit); } @@ -1239,7 +1384,7 @@ MEM_STATIC U32 ZSTD_window_update(ZSTD_window_t* wind= ow, /* * Returns the lowest allowed match index. It may either be in the ext-dic= t or the prefix. */ -MEM_STATIC U32 ZSTD_getLowestMatchIndex(const ZSTD_matchState_t* ms, U32 c= urr, unsigned windowLog) +MEM_STATIC U32 ZSTD_getLowestMatchIndex(const ZSTD_MatchState_t* ms, U32 c= urr, unsigned windowLog) { U32 const maxDistance =3D 1U << windowLog; U32 const lowestValid =3D ms->window.lowLimit; @@ -1256,7 +1401,7 @@ MEM_STATIC U32 ZSTD_getLowestMatchIndex(const ZSTD_ma= tchState_t* ms, U32 curr, u /* * Returns the lowest allowed match index in the prefix. */ -MEM_STATIC U32 ZSTD_getLowestPrefixIndex(const ZSTD_matchState_t* ms, U32 = curr, unsigned windowLog) +MEM_STATIC U32 ZSTD_getLowestPrefixIndex(const ZSTD_MatchState_t* ms, U32 = curr, unsigned windowLog) { U32 const maxDistance =3D 1U << windowLog; U32 const lowestValid =3D ms->window.dictLimit; @@ -1269,6 +1414,13 @@ MEM_STATIC U32 ZSTD_getLowestPrefixIndex(const ZSTD_= matchState_t* ms, U32 curr, return matchLowest; } =20 +/* index_safety_check: + * intentional underflow : ensure repIndex isn't overlapping dict + prefix + * @return 1 if values are not overlapping, + * 0 otherwise */ +MEM_STATIC int ZSTD_index_overlap_check(const U32 prefixLowestIndex, const= U32 repIndex) { + return ((U32)((prefixLowestIndex-1) - repIndex) >=3D 3); +} =20 =20 /* debug functions */ @@ -1302,7 +1454,42 @@ MEM_STATIC void ZSTD_debugTable(const U32* table, U3= 2 max) =20 #endif =20 +/* Short Cache */ + +/* Normally, zstd matchfinders follow this flow: + * 1. Compute hash at ip + * 2. Load index from hashTable[hash] + * 3. Check if *ip =3D=3D *(base + index) + * In dictionary compression, loading *(base + index) is often an L2 or ev= en L3 miss. + * + * Short cache is an optimization which allows us to avoid step 3 most of = the time + * when the data doesn't actually match. With short cache, the flow become= s: + * 1. Compute (hash, currentTag) at ip. currentTag is an 8-bit indepen= dent hash at ip. + * 2. Load (index, matchTag) from hashTable[hash]. See ZSTD_writeTagge= dIndex to understand how this works. + * 3. Only if currentTag =3D=3D matchTag, check *ip =3D=3D *(base + in= dex). Otherwise, continue. + * + * Currently, short cache is only implemented in CDict hashtables. Thus, i= ts use is limited to + * dictMatchState matchfinders. + */ +#define ZSTD_SHORT_CACHE_TAG_BITS 8 +#define ZSTD_SHORT_CACHE_TAG_MASK ((1u << ZSTD_SHORT_CACHE_TAG_BITS) - 1) + +/* Helper function for ZSTD_fillHashTable and ZSTD_fillDoubleHashTable. + * Unpacks hashAndTag into (hash, tag), then packs (index, tag) into hashT= able[hash]. */ +MEM_STATIC void ZSTD_writeTaggedIndex(U32* const hashTable, size_t hashAnd= Tag, U32 index) { + size_t const hash =3D hashAndTag >> ZSTD_SHORT_CACHE_TAG_BITS; + U32 const tag =3D (U32)(hashAndTag & ZSTD_SHORT_CACHE_TAG_MASK); + assert(index >> (32 - ZSTD_SHORT_CACHE_TAG_BITS) =3D=3D 0); + hashTable[hash] =3D (index << ZSTD_SHORT_CACHE_TAG_BITS) | tag; +} =20 +/* Helper function for short cache matchfinders. + * Unpacks tag1 and tag2 from lower bits of packedTag1 and packedTag2, the= n checks if the tags match. */ +MEM_STATIC int ZSTD_comparePackedTags(size_t packedTag1, size_t packedTag2= ) { + U32 const tag1 =3D packedTag1 & ZSTD_SHORT_CACHE_TAG_MASK; + U32 const tag2 =3D packedTag2 & ZSTD_SHORT_CACHE_TAG_MASK; + return tag1 =3D=3D tag2; +} =20 /* =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D * Shared internal declarations @@ -1319,6 +1506,25 @@ size_t ZSTD_loadCEntropy(ZSTD_compressedBlockState_t= * bs, void* workspace, =20 void ZSTD_reset_compressedBlockState(ZSTD_compressedBlockState_t* bs); =20 +typedef struct { + U32 idx; /* Index in array of ZSTD_Sequence */ + U32 posInSequence; /* Position within sequence at idx */ + size_t posInSrc; /* Number of bytes given by sequences provided so = far */ +} ZSTD_SequencePosition; + +/* for benchmark */ +size_t ZSTD_convertBlockSequences(ZSTD_CCtx* cctx, + const ZSTD_Sequence* const inSeqs, size_t nbSequen= ces, + int const repcodeResolution); + +typedef struct { + size_t nbSequences; + size_t blockSize; + size_t litSize; +} BlockSummary; + +BlockSummary ZSTD_get1BlockSummary(const ZSTD_Sequence* seqs, size_t nbSeq= s); + /* =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D * Private declarations * These prototypes shall only be called from within lib/compress @@ -1330,7 +1536,7 @@ void ZSTD_reset_compressedBlockState(ZSTD_compressedB= lockState_t* bs); * Note: srcSizeHint =3D=3D 0 means 0! */ ZSTD_compressionParameters ZSTD_getCParamsFromCCtxParams( - const ZSTD_CCtx_params* CCtxParams, U64 srcSizeHint, size_t dictSi= ze, ZSTD_cParamMode_e mode); + const ZSTD_CCtx_params* CCtxParams, U64 srcSizeHint, size_t dictSi= ze, ZSTD_CParamMode_e mode); =20 /*! ZSTD_initCStream_internal() : * Private use only. Init streaming operation. @@ -1342,7 +1548,7 @@ size_t ZSTD_initCStream_internal(ZSTD_CStream* zcs, const ZSTD_CDict* cdict, const ZSTD_CCtx_params* params, unsigned long long pl= edgedSrcSize); =20 -void ZSTD_resetSeqStore(seqStore_t* ssPtr); +void ZSTD_resetSeqStore(SeqStore_t* ssPtr); =20 /*! ZSTD_getCParamsFromCDict() : * as the name implies */ @@ -1381,11 +1587,10 @@ size_t ZSTD_writeLastEmptyBlock(void* dst, size_t d= stCapacity); * This cannot be used when long range matching is enabled. * Zstd will use these sequences, and pass the literals to a secondary blo= ck * compressor. - * @return : An error code on failure. * NOTE: seqs are not verified! Invalid sequences can cause out-of-bounds = memory * access and data corruption. */ -size_t ZSTD_referenceExternalSequences(ZSTD_CCtx* cctx, rawSeq* seq, size_= t nbSeq); +void ZSTD_referenceExternalSequences(ZSTD_CCtx* cctx, rawSeq* seq, size_t = nbSeq); =20 /* ZSTD_cycleLog() : * condition for correct operation : hashLog > 1 */ @@ -1396,4 +1601,28 @@ U32 ZSTD_cycleLog(U32 hashLog, ZSTD_strategy strat); */ void ZSTD_CCtx_trace(ZSTD_CCtx* cctx, size_t extraCSize); =20 +/* Returns 1 if an external sequence producer is registered, otherwise ret= urns 0. */ +MEM_STATIC int ZSTD_hasExtSeqProd(const ZSTD_CCtx_params* params) { + return params->extSeqProdFunc !=3D NULL; +} + +/* =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + * Deprecated definitions that are still used internally to avoid + * deprecation warnings. These functions are exactly equivalent to + * their public variants, but avoid the deprecation warnings. + * =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D */ + +size_t ZSTD_compressBegin_usingCDict_deprecated(ZSTD_CCtx* cctx, const ZST= D_CDict* cdict); + +size_t ZSTD_compressContinue_public(ZSTD_CCtx* cctx, + void* dst, size_t dstCapacity, + const void* src, size_t srcSize); + +size_t ZSTD_compressEnd_public(ZSTD_CCtx* cctx, + void* dst, size_t dstCapacity, + const void* src, size_t srcSize); + +size_t ZSTD_compressBlock_deprecated(ZSTD_CCtx* cctx, void* dst, size_t ds= tCapacity, const void* src, size_t srcSize); + + #endif /* ZSTD_COMPRESS_H */ diff --git a/lib/zstd/compress/zstd_compress_literals.c b/lib/zstd/compress= /zstd_compress_literals.c index 52b0a8059aba..ec39b4299b6f 100644 --- a/lib/zstd/compress/zstd_compress_literals.c +++ b/lib/zstd/compress/zstd_compress_literals.c @@ -1,5 +1,6 @@ +// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -13,11 +14,36 @@ ***************************************/ #include "zstd_compress_literals.h" =20 + +/* ************************************************************** +* Debug Traces +****************************************************************/ +#if DEBUGLEVEL >=3D 2 + +static size_t showHexa(const void* src, size_t srcSize) +{ + const BYTE* const ip =3D (const BYTE*)src; + size_t u; + for (u=3D0; u31) + (srcSize>4095); =20 + DEBUGLOG(5, "ZSTD_noCompressLiterals: srcSize=3D%zu, dstCapacity=3D%zu= ", srcSize, dstCapacity); + RETURN_ERROR_IF(srcSize + flSize > dstCapacity, dstSize_tooSmall, ""); =20 switch(flSize) @@ -36,16 +62,30 @@ size_t ZSTD_noCompressLiterals (void* dst, size_t dstCa= pacity, const void* src, } =20 ZSTD_memcpy(ostart + flSize, src, srcSize); - DEBUGLOG(5, "Raw literals: %u -> %u", (U32)srcSize, (U32)(srcSize + fl= Size)); + DEBUGLOG(5, "Raw (uncompressed) literals: %u -> %u", (U32)srcSize, (U3= 2)(srcSize + flSize)); return srcSize + flSize; } =20 +static int allBytesIdentical(const void* src, size_t srcSize) +{ + assert(srcSize >=3D 1); + assert(src !=3D NULL); + { const BYTE b =3D ((const BYTE*)src)[0]; + size_t p; + for (p=3D1; p31) + (srcSize>4095); =20 - (void)dstCapacity; /* dstCapacity already guaranteed to be >=3D4, hen= ce large enough */ + assert(dstCapacity >=3D 4); (void)dstCapacity; + assert(allBytesIdentical(src, srcSize)); =20 switch(flSize) { @@ -63,28 +103,51 @@ size_t ZSTD_compressRleLiteralsBlock (void* dst, size_= t dstCapacity, const void* } =20 ostart[flSize] =3D *(const BYTE*)src; - DEBUGLOG(5, "RLE literals: %u -> %u", (U32)srcSize, (U32)flSize + 1); + DEBUGLOG(5, "RLE : Repeated Literal (%02X: %u times) -> %u bytes encod= ed", ((const BYTE*)src)[0], (U32)srcSize, (U32)flSize + 1); return flSize+1; } =20 -size_t ZSTD_compressLiterals (ZSTD_hufCTables_t const* prevHuf, - ZSTD_hufCTables_t* nextHuf, - ZSTD_strategy strategy, int disableLiteralCo= mpression, - void* dst, size_t dstCapacity, - const void* src, size_t srcSize, - void* entropyWorkspace, size_t entropyWorksp= aceSize, - const int bmi2, - unsigned suspectUncompressible) +/* ZSTD_minLiteralsToCompress() : + * returns minimal amount of literals + * for literal compression to even be attempted. + * Minimum is made tighter as compression strategy increases. + */ +static size_t +ZSTD_minLiteralsToCompress(ZSTD_strategy strategy, HUF_repeat huf_repeat) +{ + assert((int)strategy >=3D 0); + assert((int)strategy <=3D 9); + /* btultra2 : min 8 bytes; + * then 2x larger for each successive compression strategy + * max threshold 64 bytes */ + { int const shift =3D MIN(9-(int)strategy, 3); + size_t const mintc =3D (huf_repeat =3D=3D HUF_repeat_valid) ? 6 : = (size_t)8 << shift; + DEBUGLOG(7, "minLiteralsToCompress =3D %zu", mintc); + return mintc; + } +} + +size_t ZSTD_compressLiterals ( + void* dst, size_t dstCapacity, + const void* src, size_t srcSize, + void* entropyWorkspace, size_t entropyWorkspaceSize, + const ZSTD_hufCTables_t* prevHuf, + ZSTD_hufCTables_t* nextHuf, + ZSTD_strategy strategy, + int disableLiteralCompression, + int suspectUncompressible, + int bmi2) { - size_t const minGain =3D ZSTD_minGain(srcSize, strategy); size_t const lhSize =3D 3 + (srcSize >=3D 1 KB) + (srcSize >=3D 16 KB); BYTE* const ostart =3D (BYTE*)dst; U32 singleStream =3D srcSize < 256; - symbolEncodingType_e hType =3D set_compressed; + SymbolEncodingType_e hType =3D set_compressed; size_t cLitSize; =20 - DEBUGLOG(5,"ZSTD_compressLiterals (disableLiteralCompression=3D%i srcS= ize=3D%u)", - disableLiteralCompression, (U32)srcSize); + DEBUGLOG(5,"ZSTD_compressLiterals (disableLiteralCompression=3D%i, src= Size=3D%u, dstCapacity=3D%zu)", + disableLiteralCompression, (U32)srcSize, dstCapacity); + + DEBUGLOG(6, "Completed literals listing (%zu bytes)", showHexa(src, sr= cSize)); =20 /* Prepare nextEntropy assuming reusing the existing table */ ZSTD_memcpy(nextHuf, prevHuf, sizeof(*prevHuf)); @@ -92,40 +155,51 @@ size_t ZSTD_compressLiterals (ZSTD_hufCTables_t const*= prevHuf, if (disableLiteralCompression) return ZSTD_noCompressLiterals(dst, dstCapacity, src, srcSize); =20 - /* small ? don't even attempt compression (speed opt) */ -# define COMPRESS_LITERALS_SIZE_MIN 63 - { size_t const minLitSize =3D (prevHuf->repeatMode =3D=3D HUF_repeat= _valid) ? 6 : COMPRESS_LITERALS_SIZE_MIN; - if (srcSize <=3D minLitSize) return ZSTD_noCompressLiterals(dst, d= stCapacity, src, srcSize); - } + /* if too small, don't even attempt compression (speed opt) */ + if (srcSize < ZSTD_minLiteralsToCompress(strategy, prevHuf->repeatMode= )) + return ZSTD_noCompressLiterals(dst, dstCapacity, src, srcSize); =20 RETURN_ERROR_IF(dstCapacity < lhSize+1, dstSize_tooSmall, "not enough = space for compression"); { HUF_repeat repeat =3D prevHuf->repeatMode; - int const preferRepeat =3D strategy < ZSTD_lazy ? srcSize <=3D 102= 4 : 0; + int const flags =3D 0 + | (bmi2 ? HUF_flags_bmi2 : 0) + | (strategy < ZSTD_lazy && srcSize <=3D 1024 ? HUF_flags_prefe= rRepeat : 0) + | (strategy >=3D HUF_OPTIMAL_DEPTH_THRESHOLD ? HUF_flags_optim= alDepth : 0) + | (suspectUncompressible ? HUF_flags_suspectUncompressible : 0= ); + + typedef size_t (*huf_compress_f)(void*, size_t, const void*, size_= t, unsigned, unsigned, void*, size_t, HUF_CElt*, HUF_repeat*, int); + huf_compress_f huf_compress; if (repeat =3D=3D HUF_repeat_valid && lhSize =3D=3D 3) singleStrea= m =3D 1; - cLitSize =3D singleStream ? - HUF_compress1X_repeat( - ostart+lhSize, dstCapacity-lhSize, src, srcSize, - HUF_SYMBOLVALUE_MAX, HUF_TABLELOG_DEFAULT, entropyWorkspac= e, entropyWorkspaceSize, - (HUF_CElt*)nextHuf->CTable, &repeat, preferRepeat, bmi2, s= uspectUncompressible) : - HUF_compress4X_repeat( - ostart+lhSize, dstCapacity-lhSize, src, srcSize, - HUF_SYMBOLVALUE_MAX, HUF_TABLELOG_DEFAULT, entropyWorkspac= e, entropyWorkspaceSize, - (HUF_CElt*)nextHuf->CTable, &repeat, preferRepeat, bmi2, s= uspectUncompressible); + huf_compress =3D singleStream ? HUF_compress1X_repeat : HUF_compre= ss4X_repeat; + cLitSize =3D huf_compress(ostart+lhSize, dstCapacity-lhSize, + src, srcSize, + HUF_SYMBOLVALUE_MAX, LitHufLog, + entropyWorkspace, entropyWorkspaceSize, + (HUF_CElt*)nextHuf->CTable, + &repeat, flags); + DEBUGLOG(5, "%zu literals compressed into %zu bytes (before header= )", srcSize, cLitSize); if (repeat !=3D HUF_repeat_none) { /* reused the existing table */ - DEBUGLOG(5, "Reusing previous huffman table"); + DEBUGLOG(5, "reusing statistics from previous huffman block"); hType =3D set_repeat; } } =20 - if ((cLitSize=3D=3D0) || (cLitSize >=3D srcSize - minGain) || ERR_isEr= ror(cLitSize)) { - ZSTD_memcpy(nextHuf, prevHuf, sizeof(*prevHuf)); - return ZSTD_noCompressLiterals(dst, dstCapacity, src, srcSize); - } + { size_t const minGain =3D ZSTD_minGain(srcSize, strategy); + if ((cLitSize=3D=3D0) || (cLitSize >=3D srcSize - minGain) || ERR_= isError(cLitSize)) { + ZSTD_memcpy(nextHuf, prevHuf, sizeof(*prevHuf)); + return ZSTD_noCompressLiterals(dst, dstCapacity, src, srcSize); + } } if (cLitSize=3D=3D1) { - ZSTD_memcpy(nextHuf, prevHuf, sizeof(*prevHuf)); - return ZSTD_compressRleLiteralsBlock(dst, dstCapacity, src, srcSiz= e); - } + /* A return value of 1 signals that the alphabet consists of a sin= gle symbol. + * However, in some rare circumstances, it could be the compressed= size (a single byte). + * For that outcome to have a chance to happen, it's necessary tha= t `srcSize < 8`. + * (it's also necessary to not generate statistics). + * Therefore, in such a case, actively check that all bytes are id= entical. */ + if ((srcSize >=3D 8) || allBytesIdentical(src, srcSize)) { + ZSTD_memcpy(nextHuf, prevHuf, sizeof(*prevHuf)); + return ZSTD_compressRleLiteralsBlock(dst, dstCapacity, src, sr= cSize); + } } =20 if (hType =3D=3D set_compressed) { /* using a newly constructed table */ @@ -136,16 +210,19 @@ size_t ZSTD_compressLiterals (ZSTD_hufCTables_t const= * prevHuf, switch(lhSize) { case 3: /* 2 - 2 - 10 - 10 */ - { U32 const lhc =3D hType + ((!singleStream) << 2) + ((U32)srcSi= ze<<4) + ((U32)cLitSize<<14); + if (!singleStream) assert(srcSize >=3D MIN_LITERALS_FOR_4_STREAMS); + { U32 const lhc =3D hType + ((U32)(!singleStream) << 2) + ((U32)= srcSize<<4) + ((U32)cLitSize<<14); MEM_writeLE24(ostart, lhc); break; } case 4: /* 2 - 2 - 14 - 14 */ + assert(srcSize >=3D MIN_LITERALS_FOR_4_STREAMS); { U32 const lhc =3D hType + (2 << 2) + ((U32)srcSize<<4) + ((U32= )cLitSize<<18); MEM_writeLE32(ostart, lhc); break; } case 5: /* 2 - 2 - 18 - 18 */ + assert(srcSize >=3D MIN_LITERALS_FOR_4_STREAMS); { U32 const lhc =3D hType + (3 << 2) + ((U32)srcSize<<4) + ((U32= )cLitSize<<22); MEM_writeLE32(ostart, lhc); ostart[4] =3D (BYTE)(cLitSize >> 10); diff --git a/lib/zstd/compress/zstd_compress_literals.h b/lib/zstd/compress= /zstd_compress_literals.h index 9775fb97cb70..a2a85d6b69e5 100644 --- a/lib/zstd/compress/zstd_compress_literals.h +++ b/lib/zstd/compress/zstd_compress_literals.h @@ -1,5 +1,6 @@ +/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */ /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -16,16 +17,24 @@ =20 size_t ZSTD_noCompressLiterals (void* dst, size_t dstCapacity, const void*= src, size_t srcSize); =20 +/* ZSTD_compressRleLiteralsBlock() : + * Conditions : + * - All bytes in @src are identical + * - dstCapacity >=3D 4 */ size_t ZSTD_compressRleLiteralsBlock (void* dst, size_t dstCapacity, const= void* src, size_t srcSize); =20 -/* If suspectUncompressible then some sampling checks will be run to poten= tially skip huffman coding */ -size_t ZSTD_compressLiterals (ZSTD_hufCTables_t const* prevHuf, - ZSTD_hufCTables_t* nextHuf, - ZSTD_strategy strategy, int disableLiteralCo= mpression, - void* dst, size_t dstCapacity, +/* ZSTD_compressLiterals(): + * @entropyWorkspace: must be aligned on 4-bytes boundaries + * @entropyWorkspaceSize : must be >=3D HUF_WORKSPACE_SIZE + * @suspectUncompressible: sampling checks, to potentially skip huffman co= ding + */ +size_t ZSTD_compressLiterals (void* dst, size_t dstCapacity, const void* src, size_t srcSize, void* entropyWorkspace, size_t entropyWorksp= aceSize, - const int bmi2, - unsigned suspectUncompressible); + const ZSTD_hufCTables_t* prevHuf, + ZSTD_hufCTables_t* nextHuf, + ZSTD_strategy strategy, int disableLiteralCo= mpression, + int suspectUncompressible, + int bmi2); =20 #endif /* ZSTD_COMPRESS_LITERALS_H */ diff --git a/lib/zstd/compress/zstd_compress_sequences.c b/lib/zstd/compres= s/zstd_compress_sequences.c index 21ddc1b37acf..256980c9d85a 100644 --- a/lib/zstd/compress/zstd_compress_sequences.c +++ b/lib/zstd/compress/zstd_compress_sequences.c @@ -1,5 +1,6 @@ +// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -58,7 +59,7 @@ static unsigned ZSTD_useLowProbCount(size_t const nbSeq) { /* Heuristic: This should cover most blocks <=3D 16K and * start to fade out after 16K to about 32K depending on - * comprssibility. + * compressibility. */ return nbSeq >=3D 2048; } @@ -153,20 +154,20 @@ size_t ZSTD_crossEntropyCost(short const* norm, unsig= ned accuracyLog, return cost >> 8; } =20 -symbolEncodingType_e +SymbolEncodingType_e ZSTD_selectEncodingType( FSE_repeat* repeatMode, unsigned const* count, unsigned const max, size_t const mostFrequent, size_t nbSeq, unsigned const FSELog, FSE_CTable const* prevCTable, short const* defaultNorm, U32 defaultNormLog, - ZSTD_defaultPolicy_e const isDefaultAllowed, + ZSTD_DefaultPolicy_e const isDefaultAllowed, ZSTD_strategy const strategy) { ZSTD_STATIC_ASSERT(ZSTD_defaultDisallowed =3D=3D 0 && ZSTD_defaultAllo= wed !=3D 0); if (mostFrequent =3D=3D nbSeq) { *repeatMode =3D FSE_repeat_none; if (isDefaultAllowed && nbSeq <=3D 2) { - /* Prefer set_basic over set_rle when there are 2 or less symb= ols, + /* Prefer set_basic over set_rle when there are 2 or fewer sym= bols, * since RLE uses 1 byte, but set_basic uses 5-6 bits per symb= ol. * If basic encoding isn't possible, always choose RLE. */ @@ -241,7 +242,7 @@ typedef struct { =20 size_t ZSTD_buildCTable(void* dst, size_t dstCapacity, - FSE_CTable* nextCTable, U32 FSELog, symbolEncodingType_e t= ype, + FSE_CTable* nextCTable, U32 FSELog, SymbolEncodingType_e t= ype, unsigned* count, U32 max, const BYTE* codeTable, size_t nbSeq, const S16* defaultNorm, U32 defaultNormLog, U32 defaultMax, @@ -293,7 +294,7 @@ ZSTD_encodeSequences_body( FSE_CTable const* CTable_MatchLength, BYTE const* mlCodeTable, FSE_CTable const* CTable_OffsetBits, BYTE const* ofCodeTable, FSE_CTable const* CTable_LitLength, BYTE const* llCodeTable, - seqDef const* sequences, size_t nbSeq, int longOffsets) + SeqDef const* sequences, size_t nbSeq, int longOffsets) { BIT_CStream_t blockStream; FSE_CState_t stateMatchLength; @@ -387,7 +388,7 @@ ZSTD_encodeSequences_default( FSE_CTable const* CTable_MatchLength, BYTE const* mlCodeTable, FSE_CTable const* CTable_OffsetBits, BYTE const* ofCodeTable, FSE_CTable const* CTable_LitLength, BYTE const* llCodeTable, - seqDef const* sequences, size_t nbSeq, int longOffsets) + SeqDef const* sequences, size_t nbSeq, int longOffsets) { return ZSTD_encodeSequences_body(dst, dstCapacity, CTable_MatchLength, mlCodeTable, @@ -405,7 +406,7 @@ ZSTD_encodeSequences_bmi2( FSE_CTable const* CTable_MatchLength, BYTE const* mlCodeTable, FSE_CTable const* CTable_OffsetBits, BYTE const* ofCodeTable, FSE_CTable const* CTable_LitLength, BYTE const* llCodeTable, - seqDef const* sequences, size_t nbSeq, int longOffsets) + SeqDef const* sequences, size_t nbSeq, int longOffsets) { return ZSTD_encodeSequences_body(dst, dstCapacity, CTable_MatchLength, mlCodeTable, @@ -421,7 +422,7 @@ size_t ZSTD_encodeSequences( FSE_CTable const* CTable_MatchLength, BYTE const* mlCodeTable, FSE_CTable const* CTable_OffsetBits, BYTE const* ofCodeTable, FSE_CTable const* CTable_LitLength, BYTE const* llCodeTable, - seqDef const* sequences, size_t nbSeq, int longOffsets, int bm= i2) + SeqDef const* sequences, size_t nbSeq, int longOffsets, int bm= i2) { DEBUGLOG(5, "ZSTD_encodeSequences: dstCapacity =3D %u", (unsigned)dstC= apacity); #if DYNAMIC_BMI2 diff --git a/lib/zstd/compress/zstd_compress_sequences.h b/lib/zstd/compres= s/zstd_compress_sequences.h index 7991364c2f71..14fdccb6547f 100644 --- a/lib/zstd/compress/zstd_compress_sequences.h +++ b/lib/zstd/compress/zstd_compress_sequences.h @@ -1,5 +1,6 @@ +/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */ /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -11,26 +12,27 @@ #ifndef ZSTD_COMPRESS_SEQUENCES_H #define ZSTD_COMPRESS_SEQUENCES_H =20 +#include "zstd_compress_internal.h" /* SeqDef */ #include "../common/fse.h" /* FSE_repeat, FSE_CTable */ -#include "../common/zstd_internal.h" /* symbolEncodingType_e, ZSTD_strateg= y */ +#include "../common/zstd_internal.h" /* SymbolEncodingType_e, ZSTD_strateg= y */ =20 typedef enum { ZSTD_defaultDisallowed =3D 0, ZSTD_defaultAllowed =3D 1 -} ZSTD_defaultPolicy_e; +} ZSTD_DefaultPolicy_e; =20 -symbolEncodingType_e +SymbolEncodingType_e ZSTD_selectEncodingType( FSE_repeat* repeatMode, unsigned const* count, unsigned const max, size_t const mostFrequent, size_t nbSeq, unsigned const FSELog, FSE_CTable const* prevCTable, short const* defaultNorm, U32 defaultNormLog, - ZSTD_defaultPolicy_e const isDefaultAllowed, + ZSTD_DefaultPolicy_e const isDefaultAllowed, ZSTD_strategy const strategy); =20 size_t ZSTD_buildCTable(void* dst, size_t dstCapacity, - FSE_CTable* nextCTable, U32 FSELog, symbolEncodingType_e t= ype, + FSE_CTable* nextCTable, U32 FSELog, SymbolEncodingType_e t= ype, unsigned* count, U32 max, const BYTE* codeTable, size_t nbSeq, const S16* defaultNorm, U32 defaultNormLog, U32 defaultMax, @@ -42,7 +44,7 @@ size_t ZSTD_encodeSequences( FSE_CTable const* CTable_MatchLength, BYTE const* mlCodeTable, FSE_CTable const* CTable_OffsetBits, BYTE const* ofCodeTable, FSE_CTable const* CTable_LitLength, BYTE const* llCodeTable, - seqDef const* sequences, size_t nbSeq, int longOffsets, int bm= i2); + SeqDef const* sequences, size_t nbSeq, int longOffsets, int bm= i2); =20 size_t ZSTD_fseBitCost( FSE_CTable const* ctable, diff --git a/lib/zstd/compress/zstd_compress_superblock.c b/lib/zstd/compre= ss/zstd_compress_superblock.c index 17d836cc84e8..dc12d64e935c 100644 --- a/lib/zstd/compress/zstd_compress_superblock.c +++ b/lib/zstd/compress/zstd_compress_superblock.c @@ -1,5 +1,6 @@ +// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -36,13 +37,14 @@ * If it is set_compressed, first sub-block's literals section will b= e Treeless_Literals_Block * and the following sub-blocks' literals sections will be Treeless_L= iterals_Block. * @return : compressed size of literals section of a sub-block - * Or 0 if it unable to compress. + * Or 0 if unable to compress. * Or error code */ -static size_t ZSTD_compressSubBlock_literal(const HUF_CElt* hufTable, - const ZSTD_hufCTablesMetadata_t* hufMe= tadata, - const BYTE* literals, size_t litSize, - void* dst, size_t dstSize, - const int bmi2, int writeEntropy, int*= entropyWritten) +static size_t +ZSTD_compressSubBlock_literal(const HUF_CElt* hufTable, + const ZSTD_hufCTablesMetadata_t* hufMetadata, + const BYTE* literals, size_t litSize, + void* dst, size_t dstSize, + const int bmi2, int writeEntropy, int* entro= pyWritten) { size_t const header =3D writeEntropy ? 200 : 0; size_t const lhSize =3D 3 + (litSize >=3D (1 KB - header)) + (litSize = >=3D (16 KB - header)); @@ -50,11 +52,9 @@ static size_t ZSTD_compressSubBlock_literal(const HUF_CE= lt* hufTable, BYTE* const oend =3D ostart + dstSize; BYTE* op =3D ostart + lhSize; U32 const singleStream =3D lhSize =3D=3D 3; - symbolEncodingType_e hType =3D writeEntropy ? hufMetadata->hType : set= _repeat; + SymbolEncodingType_e hType =3D writeEntropy ? hufMetadata->hType : set= _repeat; size_t cLitSize =3D 0; =20 - (void)bmi2; /* TODO bmi2... */ - DEBUGLOG(5, "ZSTD_compressSubBlock_literal (litSize=3D%zu, lhSize=3D%z= u, writeEntropy=3D%d)", litSize, lhSize, writeEntropy); =20 *entropyWritten =3D 0; @@ -76,9 +76,9 @@ static size_t ZSTD_compressSubBlock_literal(const HUF_CEl= t* hufTable, DEBUGLOG(5, "ZSTD_compressSubBlock_literal (hSize=3D%zu)", hufMeta= data->hufDesSize); } =20 - /* TODO bmi2 */ - { const size_t cSize =3D singleStream ? HUF_compress1X_usingCTable(o= p, oend-op, literals, litSize, hufTable) - : HUF_compress4X_usingCTable(op,= oend-op, literals, litSize, hufTable); + { int const flags =3D bmi2 ? HUF_flags_bmi2 : 0; + const size_t cSize =3D singleStream ? HUF_compress1X_usingCTable(o= p, (size_t)(oend-op), literals, litSize, hufTable, flags) + : HUF_compress4X_usingCTable(op,= (size_t)(oend-op), literals, litSize, hufTable, flags); op +=3D cSize; cLitSize +=3D cSize; if (cSize =3D=3D 0 || ERR_isError(cSize)) { @@ -103,7 +103,7 @@ static size_t ZSTD_compressSubBlock_literal(const HUF_C= Elt* hufTable, switch(lhSize) { case 3: /* 2 - 2 - 10 - 10 */ - { U32 const lhc =3D hType + ((!singleStream) << 2) + ((U32)litSi= ze<<4) + ((U32)cLitSize<<14); + { U32 const lhc =3D hType + ((U32)(!singleStream) << 2) + ((U32)= litSize<<4) + ((U32)cLitSize<<14); MEM_writeLE24(ostart, lhc); break; } @@ -123,26 +123,30 @@ static size_t ZSTD_compressSubBlock_literal(const HUF= _CElt* hufTable, } *entropyWritten =3D 1; DEBUGLOG(5, "Compressed literals: %u -> %u", (U32)litSize, (U32)(op-os= tart)); - return op-ostart; + return (size_t)(op-ostart); } =20 -static size_t ZSTD_seqDecompressedSize(seqStore_t const* seqStore, const s= eqDef* sequences, size_t nbSeq, size_t litSize, int lastSequence) { - const seqDef* const sstart =3D sequences; - const seqDef* const send =3D sequences + nbSeq; - const seqDef* sp =3D sstart; +static size_t +ZSTD_seqDecompressedSize(SeqStore_t const* seqStore, + const SeqDef* sequences, size_t nbSeqs, + size_t litSize, int lastSubBlock) +{ size_t matchLengthSum =3D 0; size_t litLengthSum =3D 0; - (void)(litLengthSum); /* suppress unused variable warning on some envi= ronments */ - while (send-sp > 0) { - ZSTD_sequenceLength const seqLen =3D ZSTD_getSequenceLength(seqSto= re, sp); + size_t n; + for (n=3D0; ncParams.windowLog > STREAM_ACCUM= ULATOR_MIN; BYTE* const ostart =3D (BYTE*)dst; @@ -176,14 +181,14 @@ static size_t ZSTD_compressSubBlock_sequences(const Z= STD_fseCTables_t* fseTables /* Sequences Header */ RETURN_ERROR_IF((oend-op) < 3 /*max nbSeq Size*/ + 1 /*seqHead*/, dstSize_tooSmall, ""); - if (nbSeq < 0x7F) + if (nbSeq < 128) *op++ =3D (BYTE)nbSeq; else if (nbSeq < LONGNBSEQ) op[0] =3D (BYTE)((nbSeq>>8) + 0x80), op[1] =3D (BYTE)nbSeq, op+=3D= 2; else op[0]=3D0xFF, MEM_writeLE16(op+1, (U16)(nbSeq - LONGNBSEQ)), op+= =3D3; if (nbSeq=3D=3D0) { - return op - ostart; + return (size_t)(op - ostart); } =20 /* seqHead : flags for FSE encoding type */ @@ -205,7 +210,7 @@ static size_t ZSTD_compressSubBlock_sequences(const ZST= D_fseCTables_t* fseTables } =20 { size_t const bitstreamSize =3D ZSTD_encodeSequences( - op, oend - op, + op, (size_t)(oend - op), fseTables->matchlengthCTable, mlCo= de, fseTables->offcodeCTable, ofCode, fseTables->litlengthCTable, llCode, @@ -249,7 +254,7 @@ static size_t ZSTD_compressSubBlock_sequences(const ZST= D_fseCTables_t* fseTables #endif =20 *entropyWritten =3D 1; - return op - ostart; + return (size_t)(op - ostart); } =20 /* ZSTD_compressSubBlock() : @@ -258,7 +263,7 @@ static size_t ZSTD_compressSubBlock_sequences(const ZST= D_fseCTables_t* fseTables * Or 0 if it failed to compress. */ static size_t ZSTD_compressSubBlock(const ZSTD_entropyCTables_t* entropy, const ZSTD_entropyCTablesMetadata_t* e= ntropyMetadata, - const seqDef* sequences, size_t nbSeq, + const SeqDef* sequences, size_t nbSeq, const BYTE* literals, size_t litSize, const BYTE* llCode, const BYTE* mlCode= , const BYTE* ofCode, const ZSTD_CCtx_params* cctxParams, @@ -275,7 +280,8 @@ static size_t ZSTD_compressSubBlock(const ZSTD_entropyC= Tables_t* entropy, litSize, nbSeq, writeLitEntropy, writeSeqEntropy, lastBloc= k); { size_t cLitSize =3D ZSTD_compressSubBlock_literal((const HUF_CElt*= )entropy->huf.CTable, &entropyMetadata->= hufMetadata, literals, litSize, - op, oend-op, bmi2,= writeLitEntropy, litEntropyWritten); + op, (size_t)(oend-= op), + bmi2, writeLitEntr= opy, litEntropyWritten); FORWARD_IF_ERROR(cLitSize, "ZSTD_compressSubBlock_literal failed"); if (cLitSize =3D=3D 0) return 0; op +=3D cLitSize; @@ -285,18 +291,18 @@ static size_t ZSTD_compressSubBlock(const ZSTD_entrop= yCTables_t* entropy, sequences, nbSeq, llCode, mlCode, ofCode, cctxParams, - op, oend-op, + op, (size_t)(oend-op), bmi2, writeSeqEntropy, s= eqEntropyWritten); FORWARD_IF_ERROR(cSeqSize, "ZSTD_compressSubBlock_sequences failed= "); if (cSeqSize =3D=3D 0) return 0; op +=3D cSeqSize; } /* Write block header */ - { size_t cSize =3D (op-ostart)-ZSTD_blockHeaderSize; + { size_t cSize =3D (size_t)(op-ostart) - ZSTD_blockHeaderSize; U32 const cBlockHeader24 =3D lastBlock + (((U32)bt_compressed)<<1)= + (U32)(cSize << 3); MEM_writeLE24(ostart, cBlockHeader24); } - return op-ostart; + return (size_t)(op-ostart); } =20 static size_t ZSTD_estimateSubBlockSize_literal(const BYTE* literals, size= _t litSize, @@ -322,7 +328,7 @@ static size_t ZSTD_estimateSubBlockSize_literal(const B= YTE* literals, size_t lit return 0; } =20 -static size_t ZSTD_estimateSubBlockSize_symbolType(symbolEncodingType_e ty= pe, +static size_t ZSTD_estimateSubBlockSize_symbolType(SymbolEncodingType_e ty= pe, const BYTE* codeTable, unsigned maxCode, size_t nbSeq, const FSE_CTable* fseCTable, const U8* additionalBits, @@ -385,7 +391,11 @@ static size_t ZSTD_estimateSubBlockSize_sequences(cons= t BYTE* ofCodeTable, return cSeqSizeEstimate + sequencesSectionHeaderSize; } =20 -static size_t ZSTD_estimateSubBlockSize(const BYTE* literals, size_t litSi= ze, +typedef struct { + size_t estLitSize; + size_t estBlockSize; +} EstimatedBlockSize; +static EstimatedBlockSize ZSTD_estimateSubBlockSize(const BYTE* literals, = size_t litSize, const BYTE* ofCodeTable, const BYTE* llCodeTable, const BYTE* mlCodeTable, @@ -393,15 +403,17 @@ static size_t ZSTD_estimateSubBlockSize(const BYTE* l= iterals, size_t litSize, const ZSTD_entropyCTables_t* entro= py, const ZSTD_entropyCTablesMetadata_= t* entropyMetadata, void* workspace, size_t wkspSize, - int writeLitEntropy, int writeSeqE= ntropy) { - size_t cSizeEstimate =3D 0; - cSizeEstimate +=3D ZSTD_estimateSubBlockSize_literal(literals, litSize, - &entropy->huf, &e= ntropyMetadata->hufMetadata, - workspace, wkspSi= ze, writeLitEntropy); - cSizeEstimate +=3D ZSTD_estimateSubBlockSize_sequences(ofCodeTable, ll= CodeTable, mlCodeTable, + int writeLitEntropy, int writeSeqE= ntropy) +{ + EstimatedBlockSize ebs; + ebs.estLitSize =3D ZSTD_estimateSubBlockSize_literal(literals, litSize, + &entropy->huf, &en= tropyMetadata->hufMetadata, + workspace, wkspSiz= e, writeLitEntropy); + ebs.estBlockSize =3D ZSTD_estimateSubBlockSize_sequences(ofCodeTable, = llCodeTable, mlCodeTable, nbSeq, &entropy->= fse, &entropyMetadata->fseMetadata, workspace, wkspSi= ze, writeSeqEntropy); - return cSizeEstimate + ZSTD_blockHeaderSize; + ebs.estBlockSize +=3D ebs.estLitSize + ZSTD_blockHeaderSize; + return ebs; } =20 static int ZSTD_needSequenceEntropyTables(ZSTD_fseCTablesMetadata_t const*= fseMetadata) @@ -415,14 +427,57 @@ static int ZSTD_needSequenceEntropyTables(ZSTD_fseCTa= blesMetadata_t const* fseMe return 0; } =20 +static size_t countLiterals(SeqStore_t const* seqStore, const SeqDef* sp, = size_t seqCount) +{ + size_t n, total =3D 0; + assert(sp !=3D NULL); + for (n=3D0; n %zu bytes", = seqCount, (const void*)sp, total); + return total; +} + +#define BYTESCALE 256 + +static size_t sizeBlockSequences(const SeqDef* sp, size_t nbSeqs, + size_t targetBudget, size_t avgLitCost, size_t avgSeqCost, + int firstSubBlock) +{ + size_t n, budget =3D 0, inSize=3D0; + /* entropy headers */ + size_t const headerSize =3D (size_t)firstSubBlock * 120 * BYTESCALE; /= * generous estimate */ + assert(firstSubBlock=3D=3D0 || firstSubBlock=3D=3D1); + budget +=3D headerSize; + + /* first sequence =3D> at least one sequence*/ + budget +=3D sp[0].litLength * avgLitCost + avgSeqCost; + if (budget > targetBudget) return 1; + inSize =3D sp[0].litLength + (sp[0].mlBase+MINMATCH); + + /* loop over sequences */ + for (n=3D1; n targetBudget) + /* though continue to expand until the sub-block is deemed com= pressible */ + && (budget < inSize * BYTESCALE) ) + break; + } + + return n; +} + /* ZSTD_compressSubBlock_multi() : * Breaks super-block into multiple sub-blocks and compresses them. - * Entropy will be written to the first block. - * The following blocks will use repeat mode to compress. - * All sub-blocks are compressed blocks (no raw or rle blocks). - * @return : compressed size of the super block (which is multiple ZSTD b= locks) - * Or 0 if it failed to compress. */ -static size_t ZSTD_compressSubBlock_multi(const seqStore_t* seqStorePtr, + * Entropy will be written into the first block. + * The following blocks use repeat_mode to compress. + * Sub-blocks are all compressed, except the last one when beneficial. + * @return : compressed size of the super block (which features multiple = ZSTD blocks) + * or 0 if it failed to compress. */ +static size_t ZSTD_compressSubBlock_multi(const SeqStore_t* seqStorePtr, const ZSTD_compressedBlockState_t* prevCBlock, ZSTD_compressedBlockState_t* nextCBlock, const ZSTD_entropyCTablesMetadata_t* entropyMe= tadata, @@ -432,12 +487,14 @@ static size_t ZSTD_compressSubBlock_multi(const seqSt= ore_t* seqStorePtr, const int bmi2, U32 lastBlock, void* workspace, size_t wkspSize) { - const seqDef* const sstart =3D seqStorePtr->sequencesStart; - const seqDef* const send =3D seqStorePtr->sequences; - const seqDef* sp =3D sstart; + const SeqDef* const sstart =3D seqStorePtr->sequencesStart; + const SeqDef* const send =3D seqStorePtr->sequences; + const SeqDef* sp =3D sstart; /* tracks progresses within seqStorePtr->= sequences */ + size_t const nbSeqs =3D (size_t)(send - sstart); const BYTE* const lstart =3D seqStorePtr->litStart; const BYTE* const lend =3D seqStorePtr->lit; const BYTE* lp =3D lstart; + size_t const nbLiterals =3D (size_t)(lend - lstart); BYTE const* ip =3D (BYTE const*)src; BYTE const* const iend =3D ip + srcSize; BYTE* const ostart =3D (BYTE*)dst; @@ -446,112 +503,171 @@ static size_t ZSTD_compressSubBlock_multi(const seq= Store_t* seqStorePtr, const BYTE* llCodePtr =3D seqStorePtr->llCode; const BYTE* mlCodePtr =3D seqStorePtr->mlCode; const BYTE* ofCodePtr =3D seqStorePtr->ofCode; - size_t targetCBlockSize =3D cctxParams->targetCBlockSize; - size_t litSize, seqCount; - int writeLitEntropy =3D entropyMetadata->hufMetadata.hType =3D=3D set_= compressed; + size_t const minTarget =3D ZSTD_TARGETCBLOCKSIZE_MIN; /* enforce minim= um size, to reduce undesirable side effects */ + size_t const targetCBlockSize =3D MAX(minTarget, cctxParams->targetCBl= ockSize); + int writeLitEntropy =3D (entropyMetadata->hufMetadata.hType =3D=3D set= _compressed); int writeSeqEntropy =3D 1; - int lastSequence =3D 0; - - DEBUGLOG(5, "ZSTD_compressSubBlock_multi (litSize=3D%u, nbSeq=3D%u)", - (unsigned)(lend-lp), (unsigned)(send-sstart)); - - litSize =3D 0; - seqCount =3D 0; - do { - size_t cBlockSizeEstimate =3D 0; - if (sstart =3D=3D send) { - lastSequence =3D 1; - } else { - const seqDef* const sequence =3D sp + seqCount; - lastSequence =3D sequence =3D=3D send - 1; - litSize +=3D ZSTD_getSequenceLength(seqStorePtr, sequence).lit= Length; - seqCount++; - } - if (lastSequence) { - assert(lp <=3D lend); - assert(litSize <=3D (size_t)(lend - lp)); - litSize =3D (size_t)(lend - lp); + + DEBUGLOG(5, "ZSTD_compressSubBlock_multi (srcSize=3D%u, litSize=3D%u, = nbSeq=3D%u)", + (unsigned)srcSize, (unsigned)(lend-lstart), (unsigned)(send= -sstart)); + + /* let's start by a general estimation for the full block */ + if (nbSeqs > 0) { + EstimatedBlockSize const ebs =3D + ZSTD_estimateSubBlockSize(lp, nbLiterals, + ofCodePtr, llCodePtr, mlCodePtr, n= bSeqs, + &nextCBlock->entropy, entropyMetad= ata, + workspace, wkspSize, + writeLitEntropy, writeSeqEntropy); + /* quick estimation */ + size_t const avgLitCost =3D nbLiterals ? (ebs.estLitSize * BYTESCA= LE) / nbLiterals : BYTESCALE; + size_t const avgSeqCost =3D ((ebs.estBlockSize - ebs.estLitSize) *= BYTESCALE) / nbSeqs; + const size_t nbSubBlocks =3D MAX((ebs.estBlockSize + (targetCBlock= Size/2)) / targetCBlockSize, 1); + size_t n, avgBlockBudget, blockBudgetSupp=3D0; + avgBlockBudget =3D (ebs.estBlockSize * BYTESCALE) / nbSubBlocks; + DEBUGLOG(5, "estimated fullblock size=3D%u bytes ; avgLitCost=3D%.= 2f ; avgSeqCost=3D%.2f ; targetCBlockSize=3D%u, nbSubBlocks=3D%u ; avgBlock= Budget=3D%.0f bytes", + (unsigned)ebs.estBlockSize, (double)avgLitCost/BYTESCA= LE, (double)avgSeqCost/BYTESCALE, + (unsigned)targetCBlockSize, (unsigned)nbSubBlocks, (do= uble)avgBlockBudget/BYTESCALE); + /* simplification: if estimates states that the full superblock do= esn't compress, just bail out immediately + * this will result in the production of a single uncompressed blo= ck covering @srcSize.*/ + if (ebs.estBlockSize > srcSize) return 0; + + /* compress and write sub-blocks */ + assert(nbSubBlocks>0); + for (n=3D0; n < nbSubBlocks-1; n++) { + /* determine nb of sequences for current sub-block + nbLiteral= s from next sequence */ + size_t const seqCount =3D sizeBlockSequences(sp, (size_t)(send= -sp), + avgBlockBudget + blockBudgetSupp, = avgLitCost, avgSeqCost, n=3D=3D0); + /* if reached last sequence : break to last sub-block (simplif= ication) */ + assert(seqCount <=3D (size_t)(send-sp)); + if (sp + seqCount =3D=3D send) break; + assert(seqCount > 0); + /* compress sub-block */ + { int litEntropyWritten =3D 0; + int seqEntropyWritten =3D 0; + size_t litSize =3D countLiterals(seqStorePtr, sp, seqCount= ); + const size_t decompressedSize =3D + ZSTD_seqDecompressedSize(seqStorePtr, sp, seqCount= , litSize, 0); + size_t const cSize =3D ZSTD_compressSubBlock(&nextCBlock->= entropy, entropyMetadata, + sp, seqCount, + lp, litSize, + llCodePtr, mlCodePtr, ofCo= dePtr, + cctxParams, + op, (size_t)(oend-op), + bmi2, writeLitEntropy, wri= teSeqEntropy, + &litEntropyWritten, &seqEn= tropyWritten, + 0); + FORWARD_IF_ERROR(cSize, "ZSTD_compressSubBlock failed"); + + /* check compressibility, update state components */ + if (cSize > 0 && cSize < decompressedSize) { + DEBUGLOG(5, "Committed sub-block compressing %u bytes = =3D> %u bytes", + (unsigned)decompressedSize, (unsigned)cSiz= e); + assert(ip + decompressedSize <=3D iend); + ip +=3D decompressedSize; + lp +=3D litSize; + op +=3D cSize; + llCodePtr +=3D seqCount; + mlCodePtr +=3D seqCount; + ofCodePtr +=3D seqCount; + /* Entropy only needs to be written once */ + if (litEntropyWritten) { + writeLitEntropy =3D 0; + } + if (seqEntropyWritten) { + writeSeqEntropy =3D 0; + } + sp +=3D seqCount; + blockBudgetSupp =3D 0; + } } + /* otherwise : do not compress yet, coalesce current sub-block= with following one */ } - /* I think there is an optimization opportunity here. - * Calling ZSTD_estimateSubBlockSize for every sequence can be was= teful - * since it recalculates estimate from scratch. - * For example, it would recount literal distribution and symbol c= odes every time. - */ - cBlockSizeEstimate =3D ZSTD_estimateSubBlockSize(lp, litSize, ofCo= dePtr, llCodePtr, mlCodePtr, seqCount, - &nextCBlock->entrop= y, entropyMetadata, - workspace, wkspSize= , writeLitEntropy, writeSeqEntropy); - if (cBlockSizeEstimate > targetCBlockSize || lastSequence) { - int litEntropyWritten =3D 0; - int seqEntropyWritten =3D 0; - const size_t decompressedSize =3D ZSTD_seqDecompressedSize(seq= StorePtr, sp, seqCount, litSize, lastSequence); - const size_t cSize =3D ZSTD_compressSubBlock(&nextCBlock->entr= opy, entropyMetadata, - sp, seqCount, - lp, litSize, - llCodePtr, mlCodePt= r, ofCodePtr, - cctxParams, - op, oend-op, - bmi2, writeLitEntro= py, writeSeqEntropy, - &litEntropyWritten,= &seqEntropyWritten, - lastBlock && lastSe= quence); - FORWARD_IF_ERROR(cSize, "ZSTD_compressSubBlock failed"); - if (cSize > 0 && cSize < decompressedSize) { - DEBUGLOG(5, "Committed the sub-block"); - assert(ip + decompressedSize <=3D iend); - ip +=3D decompressedSize; - sp +=3D seqCount; - lp +=3D litSize; - op +=3D cSize; - llCodePtr +=3D seqCount; - mlCodePtr +=3D seqCount; - ofCodePtr +=3D seqCount; - litSize =3D 0; - seqCount =3D 0; - /* Entropy only needs to be written once */ - if (litEntropyWritten) { - writeLitEntropy =3D 0; - } - if (seqEntropyWritten) { - writeSeqEntropy =3D 0; - } + } /* if (nbSeqs > 0) */ + + /* write last block */ + DEBUGLOG(5, "Generate last sub-block: %u sequences remaining", (unsign= ed)(send - sp)); + { int litEntropyWritten =3D 0; + int seqEntropyWritten =3D 0; + size_t litSize =3D (size_t)(lend - lp); + size_t seqCount =3D (size_t)(send - sp); + const size_t decompressedSize =3D + ZSTD_seqDecompressedSize(seqStorePtr, sp, seqCount, litSiz= e, 1); + size_t const cSize =3D ZSTD_compressSubBlock(&nextCBlock->entropy,= entropyMetadata, + sp, seqCount, + lp, litSize, + llCodePtr, mlCodePtr, ofCodePt= r, + cctxParams, + op, (size_t)(oend-op), + bmi2, writeLitEntropy, writeSe= qEntropy, + &litEntropyWritten, &seqEntrop= yWritten, + lastBlock); + FORWARD_IF_ERROR(cSize, "ZSTD_compressSubBlock failed"); + + /* update pointers, the nb of literals borrowed from next sequence= must be preserved */ + if (cSize > 0 && cSize < decompressedSize) { + DEBUGLOG(5, "Last sub-block compressed %u bytes =3D> %u bytes", + (unsigned)decompressedSize, (unsigned)cSize); + assert(ip + decompressedSize <=3D iend); + ip +=3D decompressedSize; + lp +=3D litSize; + op +=3D cSize; + llCodePtr +=3D seqCount; + mlCodePtr +=3D seqCount; + ofCodePtr +=3D seqCount; + /* Entropy only needs to be written once */ + if (litEntropyWritten) { + writeLitEntropy =3D 0; } + if (seqEntropyWritten) { + writeSeqEntropy =3D 0; + } + sp +=3D seqCount; } - } while (!lastSequence); + } + + if (writeLitEntropy) { - DEBUGLOG(5, "ZSTD_compressSubBlock_multi has literal entropy table= s unwritten"); + DEBUGLOG(5, "Literal entropy tables were never written"); ZSTD_memcpy(&nextCBlock->entropy.huf, &prevCBlock->entropy.huf, si= zeof(prevCBlock->entropy.huf)); } if (writeSeqEntropy && ZSTD_needSequenceEntropyTables(&entropyMetadata= ->fseMetadata)) { /* If we haven't written our entropy tables, then we've violated o= ur contract and * must emit an uncompressed block. */ - DEBUGLOG(5, "ZSTD_compressSubBlock_multi has sequence entropy tabl= es unwritten"); + DEBUGLOG(5, "Sequence entropy tables were never written =3D> cance= l, emit an uncompressed block"); return 0; } + if (ip < iend) { - size_t const cSize =3D ZSTD_noCompressBlock(op, oend - op, ip, ien= d - ip, lastBlock); - DEBUGLOG(5, "ZSTD_compressSubBlock_multi last sub-block uncompress= ed, %zu bytes", (size_t)(iend - ip)); + /* some data left : last part of the block sent uncompressed */ + size_t const rSize =3D (size_t)((iend - ip)); + size_t const cSize =3D ZSTD_noCompressBlock(op, (size_t)(oend - op= ), ip, rSize, lastBlock); + DEBUGLOG(5, "Generate last uncompressed sub-block of %u bytes", (u= nsigned)(rSize)); FORWARD_IF_ERROR(cSize, "ZSTD_noCompressBlock failed"); assert(cSize !=3D 0); op +=3D cSize; /* We have to regenerate the repcodes because we've skipped some s= equences */ if (sp < send) { - seqDef const* seq; - repcodes_t rep; + const SeqDef* seq; + Repcodes_t rep; ZSTD_memcpy(&rep, prevCBlock->rep, sizeof(rep)); for (seq =3D sstart; seq < sp; ++seq) { - ZSTD_updateRep(rep.rep, seq->offBase - 1, ZSTD_getSequence= Length(seqStorePtr, seq).litLength =3D=3D 0); + ZSTD_updateRep(rep.rep, seq->offBase, ZSTD_getSequenceLeng= th(seqStorePtr, seq).litLength =3D=3D 0); } ZSTD_memcpy(nextCBlock->rep, &rep, sizeof(rep)); } } - DEBUGLOG(5, "ZSTD_compressSubBlock_multi compressed"); - return op-ostart; + + DEBUGLOG(5, "ZSTD_compressSubBlock_multi compressed all subBlocks: tot= al compressed size =3D %u", + (unsigned)(op-ostart)); + return (size_t)(op-ostart); } =20 size_t ZSTD_compressSuperBlock(ZSTD_CCtx* zc, void* dst, size_t dstCapacity, - void const* src, size_t srcSize, - unsigned lastBlock) { + const void* src, size_t srcSize, + unsigned lastBlock) +{ ZSTD_entropyCTablesMetadata_t entropyMetadata; =20 FORWARD_IF_ERROR(ZSTD_buildBlockEntropyStats(&zc->seqStore, @@ -559,7 +675,7 @@ size_t ZSTD_compressSuperBlock(ZSTD_CCtx* zc, &zc->blockState.nextCBlock->entropy, &zc->appliedParams, &entropyMetadata, - zc->entropyWorkspace, ENTROPY_WORKSPACE_SIZE /* statically alloc= ated in resetCCtx */), ""); + zc->tmpWorkspace, zc->tmpWkspSize /* statically allocated in res= etCCtx */), ""); =20 return ZSTD_compressSubBlock_multi(&zc->seqStore, zc->blockState.prevCBlock, @@ -569,5 +685,5 @@ size_t ZSTD_compressSuperBlock(ZSTD_CCtx* zc, dst, dstCapacity, src, srcSize, zc->bmi2, lastBlock, - zc->entropyWorkspace, ENTROPY_WORKSPACE_SIZE /* statically all= ocated in resetCCtx */); + zc->tmpWorkspace, zc->tmpWkspSize /* statically allocated in r= esetCCtx */); } diff --git a/lib/zstd/compress/zstd_compress_superblock.h b/lib/zstd/compre= ss/zstd_compress_superblock.h index 224ece79546e..826bbc9e029b 100644 --- a/lib/zstd/compress/zstd_compress_superblock.h +++ b/lib/zstd/compress/zstd_compress_superblock.h @@ -1,5 +1,6 @@ +/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */ /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the diff --git a/lib/zstd/compress/zstd_cwksp.h b/lib/zstd/compress/zstd_cwksp.h index 349fc923c355..dce42f653bae 100644 --- a/lib/zstd/compress/zstd_cwksp.h +++ b/lib/zstd/compress/zstd_cwksp.h @@ -1,5 +1,6 @@ +/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */ /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -14,8 +15,10 @@ /*-************************************* * Dependencies ***************************************/ +#include "../common/allocations.h" /* ZSTD_customMalloc, ZSTD_customFree = */ #include "../common/zstd_internal.h" - +#include "../common/portability_macros.h" +#include "../common/compiler.h" /* ZS2_isPower2 */ =20 /*-************************************* * Constants @@ -41,8 +44,9 @@ ***************************************/ typedef enum { ZSTD_cwksp_alloc_objects, - ZSTD_cwksp_alloc_buffers, - ZSTD_cwksp_alloc_aligned + ZSTD_cwksp_alloc_aligned_init_once, + ZSTD_cwksp_alloc_aligned, + ZSTD_cwksp_alloc_buffers } ZSTD_cwksp_alloc_phase_e; =20 /* @@ -95,8 +99,8 @@ typedef enum { * * Workspace Layout: * - * [ ... workspace ... ] - * [objects][tables ... ->] free space [<- ... aligned][<- ... buffers] + * [ ... workspace ... ] + * [objects][tables ->] free space [<- buffers][<- aligned][<- init once] * * The various objects that live in the workspace are divided into the * following categories, and are allocated separately: @@ -120,9 +124,18 @@ typedef enum { * uint32_t arrays, all of whose values are between 0 and (nextSrc - bas= e). * Their sizes depend on the cparams. These tables are 64-byte aligned. * - * - Aligned: these buffers are used for various purposes that require 4 b= yte - * alignment, but don't require any initialization before they're used. = These - * buffers are each aligned to 64 bytes. + * - Init once: these buffers require to be initialized at least once befo= re + * use. They should be used when we want to skip memory initialization + * while not triggering memory checkers (like Valgrind) when reading from + * from this memory without writing to it first. + * These buffers should be used carefully as they might contain data + * from previous compressions. + * Buffers are aligned to 64 bytes. + * + * - Aligned: these buffers don't require any initialization before they're + * used. The user of the buffer should make sure they write into a buffer + * location before reading from it. + * Buffers are aligned to 64 bytes. * * - Buffers: these buffers are used for various purposes that don't requi= re * any alignment or initialization before they're used. This means they = can @@ -134,8 +147,9 @@ typedef enum { * correctly packed into the workspace buffer. That order is: * * 1. Objects - * 2. Buffers - * 3. Aligned/Tables + * 2. Init once / Tables + * 3. Aligned / Tables + * 4. Buffers / Tables * * Attempts to reserve objects of different types out of order will fail. */ @@ -147,6 +161,7 @@ typedef struct { void* tableEnd; void* tableValidEnd; void* allocStart; + void* initOnceStart; =20 BYTE allocFailed; int workspaceOversizedDuration; @@ -159,6 +174,7 @@ typedef struct { ***************************************/ =20 MEM_STATIC size_t ZSTD_cwksp_available_space(ZSTD_cwksp* ws); +MEM_STATIC void* ZSTD_cwksp_initialAllocStart(ZSTD_cwksp* ws); =20 MEM_STATIC void ZSTD_cwksp_assert_internal_consistency(ZSTD_cwksp* ws) { (void)ws; @@ -168,14 +184,16 @@ MEM_STATIC void ZSTD_cwksp_assert_internal_consistenc= y(ZSTD_cwksp* ws) { assert(ws->tableEnd <=3D ws->allocStart); assert(ws->tableValidEnd <=3D ws->allocStart); assert(ws->allocStart <=3D ws->workspaceEnd); + assert(ws->initOnceStart <=3D ZSTD_cwksp_initialAllocStart(ws)); + assert(ws->workspace <=3D ws->initOnceStart); } =20 /* * Align must be a power of 2. */ -MEM_STATIC size_t ZSTD_cwksp_align(size_t size, size_t const align) { +MEM_STATIC size_t ZSTD_cwksp_align(size_t size, size_t align) { size_t const mask =3D align - 1; - assert((align & mask) =3D=3D 0); + assert(ZSTD_isPower2(align)); return (size + mask) & ~mask; } =20 @@ -189,7 +207,7 @@ MEM_STATIC size_t ZSTD_cwksp_align(size_t size, size_t = const align) { * to figure out how much space you need for the matchState tables. Everyt= hing * else is though. * - * Do not use for sizing aligned buffers. Instead, use ZSTD_cwksp_aligned_= alloc_size(). + * Do not use for sizing aligned buffers. Instead, use ZSTD_cwksp_aligned6= 4_alloc_size(). */ MEM_STATIC size_t ZSTD_cwksp_alloc_size(size_t size) { if (size =3D=3D 0) @@ -197,12 +215,16 @@ MEM_STATIC size_t ZSTD_cwksp_alloc_size(size_t size) { return size; } =20 +MEM_STATIC size_t ZSTD_cwksp_aligned_alloc_size(size_t size, size_t alignm= ent) { + return ZSTD_cwksp_alloc_size(ZSTD_cwksp_align(size, alignment)); +} + /* * Returns an adjusted alloc size that is the nearest larger multiple of 6= 4 bytes. * Used to determine the number of bytes required for a given "aligned". */ -MEM_STATIC size_t ZSTD_cwksp_aligned_alloc_size(size_t size) { - return ZSTD_cwksp_alloc_size(ZSTD_cwksp_align(size, ZSTD_CWKSP_ALIGNME= NT_BYTES)); +MEM_STATIC size_t ZSTD_cwksp_aligned64_alloc_size(size_t size) { + return ZSTD_cwksp_aligned_alloc_size(size, ZSTD_CWKSP_ALIGNMENT_BYTES); } =20 /* @@ -210,14 +232,10 @@ MEM_STATIC size_t ZSTD_cwksp_aligned_alloc_size(size_= t size) { * for internal purposes (currently only alignment). */ MEM_STATIC size_t ZSTD_cwksp_slack_space_required(void) { - /* For alignment, the wksp will always allocate an additional n_1=3D[1= , 64] bytes - * to align the beginning of tables section, as well as another n_2=3D= [0, 63] bytes - * to align the beginning of the aligned section. - * - * n_1 + n_2 =3D=3D 64 bytes if the cwksp is freshly allocated, due to= tables and - * aligneds being sized in multiples of 64 bytes. + /* For alignment, the wksp will always allocate an additional 2*ZSTD_C= WKSP_ALIGNMENT_BYTES + * bytes to align the beginning of tables section and end of buffers; */ - size_t const slackSpace =3D ZSTD_CWKSP_ALIGNMENT_BYTES; + size_t const slackSpace =3D ZSTD_CWKSP_ALIGNMENT_BYTES * 2; return slackSpace; } =20 @@ -229,11 +247,23 @@ MEM_STATIC size_t ZSTD_cwksp_slack_space_required(voi= d) { MEM_STATIC size_t ZSTD_cwksp_bytes_to_align_ptr(void* ptr, const size_t al= ignBytes) { size_t const alignBytesMask =3D alignBytes - 1; size_t const bytes =3D (alignBytes - ((size_t)ptr & (alignBytesMask)))= & alignBytesMask; - assert((alignBytes & alignBytesMask) =3D=3D 0); - assert(bytes !=3D ZSTD_CWKSP_ALIGNMENT_BYTES); + assert(ZSTD_isPower2(alignBytes)); + assert(bytes < alignBytes); return bytes; } =20 +/* + * Returns the initial value for allocStart which is used to determine the= position from + * which we can allocate from the end of the workspace. + */ +MEM_STATIC void* ZSTD_cwksp_initialAllocStart(ZSTD_cwksp* ws) +{ + char* endPtr =3D (char*)ws->workspaceEnd; + assert(ZSTD_isPower2(ZSTD_CWKSP_ALIGNMENT_BYTES)); + endPtr =3D endPtr - ((size_t)endPtr % ZSTD_CWKSP_ALIGNMENT_BYTES); + return (void*)endPtr; +} + /* * Internal function. Do not use directly. * Reserves the given number of bytes within the aligned/buffer segment of= the wksp, @@ -246,7 +276,7 @@ ZSTD_cwksp_reserve_internal_buffer_space(ZSTD_cwksp* ws= , size_t const bytes) { void* const alloc =3D (BYTE*)ws->allocStart - bytes; void* const bottom =3D ws->tableEnd; - DEBUGLOG(5, "cwksp: reserving %p %zd bytes, %zd bytes remaining", + DEBUGLOG(5, "cwksp: reserving [0x%p]:%zd bytes; %zd bytes remaining", alloc, bytes, ZSTD_cwksp_available_space(ws) - bytes); ZSTD_cwksp_assert_internal_consistency(ws); assert(alloc >=3D bottom); @@ -274,27 +304,16 @@ ZSTD_cwksp_internal_advance_phase(ZSTD_cwksp* ws, ZST= D_cwksp_alloc_phase_e phase { assert(phase >=3D ws->phase); if (phase > ws->phase) { - /* Going from allocating objects to allocating buffers */ - if (ws->phase < ZSTD_cwksp_alloc_buffers && - phase >=3D ZSTD_cwksp_alloc_buffers) { + /* Going from allocating objects to allocating initOnce / tables */ + if (ws->phase < ZSTD_cwksp_alloc_aligned_init_once && + phase >=3D ZSTD_cwksp_alloc_aligned_init_once) { ws->tableValidEnd =3D ws->objectEnd; - } + ws->initOnceStart =3D ZSTD_cwksp_initialAllocStart(ws); =20 - /* Going from allocating buffers to allocating aligneds/tables */ - if (ws->phase < ZSTD_cwksp_alloc_aligned && - phase >=3D ZSTD_cwksp_alloc_aligned) { - { /* Align the start of the "aligned" to 64 bytes. Use [1, 6= 4] bytes. */ - size_t const bytesToAlign =3D - ZSTD_CWKSP_ALIGNMENT_BYTES - ZSTD_cwksp_bytes_to_align= _ptr(ws->allocStart, ZSTD_CWKSP_ALIGNMENT_BYTES); - DEBUGLOG(5, "reserving aligned alignment addtl space: %zu"= , bytesToAlign); - ZSTD_STATIC_ASSERT((ZSTD_CWKSP_ALIGNMENT_BYTES & (ZSTD_CWK= SP_ALIGNMENT_BYTES - 1)) =3D=3D 0); /* power of 2 */ - RETURN_ERROR_IF(!ZSTD_cwksp_reserve_internal_buffer_space(= ws, bytesToAlign), - memory_allocation, "aligned phase - alignm= ent initial allocation failed!"); - } { /* Align the start of the tables to 64 bytes. Use [0, 63] = bytes */ - void* const alloc =3D ws->objectEnd; + void *const alloc =3D ws->objectEnd; size_t const bytesToAlign =3D ZSTD_cwksp_bytes_to_align_pt= r(alloc, ZSTD_CWKSP_ALIGNMENT_BYTES); - void* const objectEnd =3D (BYTE*)alloc + bytesToAlign; + void *const objectEnd =3D (BYTE *) alloc + bytesToAlign; DEBUGLOG(5, "reserving table alignment addtl space: %zu", = bytesToAlign); RETURN_ERROR_IF(objectEnd > ws->workspaceEnd, memory_alloc= ation, "table phase - alignment initial allocatio= n failed!"); @@ -302,7 +321,9 @@ ZSTD_cwksp_internal_advance_phase(ZSTD_cwksp* ws, ZSTD_= cwksp_alloc_phase_e phase ws->tableEnd =3D objectEnd; /* table area starts being em= pty */ if (ws->tableValidEnd < ws->tableEnd) { ws->tableValidEnd =3D ws->tableEnd; - } } } + } + } + } ws->phase =3D phase; ZSTD_cwksp_assert_internal_consistency(ws); } @@ -314,7 +335,7 @@ ZSTD_cwksp_internal_advance_phase(ZSTD_cwksp* ws, ZSTD_= cwksp_alloc_phase_e phase */ MEM_STATIC int ZSTD_cwksp_owns_buffer(const ZSTD_cwksp* ws, const void* pt= r) { - return (ptr !=3D NULL) && (ws->workspace <=3D ptr) && (ptr <=3D ws->wo= rkspaceEnd); + return (ptr !=3D NULL) && (ws->workspace <=3D ptr) && (ptr < ws->works= paceEnd); } =20 /* @@ -345,29 +366,61 @@ MEM_STATIC BYTE* ZSTD_cwksp_reserve_buffer(ZSTD_cwksp= * ws, size_t bytes) =20 /* * Reserves and returns memory sized on and aligned on ZSTD_CWKSP_ALIGNMEN= T_BYTES (64 bytes). + * This memory has been initialized at least once in the past. + * This doesn't mean it has been initialized this time, and it might conta= in data from previous + * operations. + * The main usage is for algorithms that might need read access into unini= tialized memory. + * The algorithm must maintain safety under these conditions and must make= sure it doesn't + * leak any of the past data (directly or in side channels). */ -MEM_STATIC void* ZSTD_cwksp_reserve_aligned(ZSTD_cwksp* ws, size_t bytes) +MEM_STATIC void* ZSTD_cwksp_reserve_aligned_init_once(ZSTD_cwksp* ws, size= _t bytes) { - void* ptr =3D ZSTD_cwksp_reserve_internal(ws, ZSTD_cwksp_align(bytes, = ZSTD_CWKSP_ALIGNMENT_BYTES), - ZSTD_cwksp_alloc_aligned); - assert(((size_t)ptr & (ZSTD_CWKSP_ALIGNMENT_BYTES-1))=3D=3D 0); + size_t const alignedBytes =3D ZSTD_cwksp_align(bytes, ZSTD_CWKSP_ALIGN= MENT_BYTES); + void* ptr =3D ZSTD_cwksp_reserve_internal(ws, alignedBytes, ZSTD_cwksp= _alloc_aligned_init_once); + assert(((size_t)ptr & (ZSTD_CWKSP_ALIGNMENT_BYTES-1)) =3D=3D 0); + if(ptr && ptr < ws->initOnceStart) { + /* We assume the memory following the current allocation is either: + * 1. Not usable as initOnce memory (end of workspace) + * 2. Another initOnce buffer that has been allocated before (and = so was previously memset) + * 3. An ASAN redzone, in which case we don't want to write on it + * For these reasons it should be fine to not explicitly zero ever= y byte up to ws->initOnceStart. + * Note that we assume here that MSAN and ASAN cannot run in the s= ame time. */ + ZSTD_memset(ptr, 0, MIN((size_t)((U8*)ws->initOnceStart - (U8*)ptr= ), alignedBytes)); + ws->initOnceStart =3D ptr; + } + return ptr; +} + +/* + * Reserves and returns memory sized on and aligned on ZSTD_CWKSP_ALIGNMEN= T_BYTES (64 bytes). + */ +MEM_STATIC void* ZSTD_cwksp_reserve_aligned64(ZSTD_cwksp* ws, size_t bytes) +{ + void* const ptr =3D ZSTD_cwksp_reserve_internal(ws, + ZSTD_cwksp_align(bytes, ZSTD_CWKSP_ALIGNMENT_BYTES= ), + ZSTD_cwksp_alloc_aligned); + assert(((size_t)ptr & (ZSTD_CWKSP_ALIGNMENT_BYTES-1)) =3D=3D 0); return ptr; } =20 /* * Aligned on 64 bytes. These buffers have the special property that - * their values remain constrained, allowing us to re-use them without + * their values remain constrained, allowing us to reuse them without * memset()-ing them. */ MEM_STATIC void* ZSTD_cwksp_reserve_table(ZSTD_cwksp* ws, size_t bytes) { - const ZSTD_cwksp_alloc_phase_e phase =3D ZSTD_cwksp_alloc_aligned; + const ZSTD_cwksp_alloc_phase_e phase =3D ZSTD_cwksp_alloc_aligned_init= _once; void* alloc; void* end; void* top; =20 - if (ZSTD_isError(ZSTD_cwksp_internal_advance_phase(ws, phase))) { - return NULL; + /* We can only start allocating tables after we are done reserving spa= ce for objects at the + * start of the workspace */ + if(ws->phase < phase) { + if (ZSTD_isError(ZSTD_cwksp_internal_advance_phase(ws, phase))) { + return NULL; + } } alloc =3D ws->tableEnd; end =3D (BYTE *)alloc + bytes; @@ -387,7 +440,7 @@ MEM_STATIC void* ZSTD_cwksp_reserve_table(ZSTD_cwksp* w= s, size_t bytes) =20 =20 assert((bytes & (ZSTD_CWKSP_ALIGNMENT_BYTES-1)) =3D=3D 0); - assert(((size_t)alloc & (ZSTD_CWKSP_ALIGNMENT_BYTES-1))=3D=3D 0); + assert(((size_t)alloc & (ZSTD_CWKSP_ALIGNMENT_BYTES-1)) =3D=3D 0); return alloc; } =20 @@ -421,6 +474,20 @@ MEM_STATIC void* ZSTD_cwksp_reserve_object(ZSTD_cwksp*= ws, size_t bytes) =20 return alloc; } +/* + * with alignment control + * Note : should happen only once, at workspace first initialization + */ +MEM_STATIC void* ZSTD_cwksp_reserve_object_aligned(ZSTD_cwksp* ws, size_t = byteSize, size_t alignment) +{ + size_t const mask =3D alignment - 1; + size_t const surplus =3D (alignment > sizeof(void*)) ? alignment - siz= eof(void*) : 0; + void* const start =3D ZSTD_cwksp_reserve_object(ws, byteSize + surplus= ); + if (start =3D=3D NULL) return NULL; + if (surplus =3D=3D 0) return start; + assert(ZSTD_isPower2(alignment)); + return (void*)(((size_t)start + surplus) & ~mask); +} =20 MEM_STATIC void ZSTD_cwksp_mark_tables_dirty(ZSTD_cwksp* ws) { @@ -451,7 +518,7 @@ MEM_STATIC void ZSTD_cwksp_clean_tables(ZSTD_cwksp* ws)= { assert(ws->tableValidEnd >=3D ws->objectEnd); assert(ws->tableValidEnd <=3D ws->allocStart); if (ws->tableValidEnd < ws->tableEnd) { - ZSTD_memset(ws->tableValidEnd, 0, (BYTE*)ws->tableEnd - (BYTE*)ws-= >tableValidEnd); + ZSTD_memset(ws->tableValidEnd, 0, (size_t)((BYTE*)ws->tableEnd - (= BYTE*)ws->tableValidEnd)); } ZSTD_cwksp_mark_tables_clean(ws); } @@ -460,7 +527,8 @@ MEM_STATIC void ZSTD_cwksp_clean_tables(ZSTD_cwksp* ws)= { * Invalidates table allocations. * All other allocations remain valid. */ -MEM_STATIC void ZSTD_cwksp_clear_tables(ZSTD_cwksp* ws) { +MEM_STATIC void ZSTD_cwksp_clear_tables(ZSTD_cwksp* ws) +{ DEBUGLOG(4, "cwksp: clearing tables!"); =20 =20 @@ -478,14 +546,23 @@ MEM_STATIC void ZSTD_cwksp_clear(ZSTD_cwksp* ws) { =20 =20 ws->tableEnd =3D ws->objectEnd; - ws->allocStart =3D ws->workspaceEnd; + ws->allocStart =3D ZSTD_cwksp_initialAllocStart(ws); ws->allocFailed =3D 0; - if (ws->phase > ZSTD_cwksp_alloc_buffers) { - ws->phase =3D ZSTD_cwksp_alloc_buffers; + if (ws->phase > ZSTD_cwksp_alloc_aligned_init_once) { + ws->phase =3D ZSTD_cwksp_alloc_aligned_init_once; } ZSTD_cwksp_assert_internal_consistency(ws); } =20 +MEM_STATIC size_t ZSTD_cwksp_sizeof(const ZSTD_cwksp* ws) { + return (size_t)((BYTE*)ws->workspaceEnd - (BYTE*)ws->workspace); +} + +MEM_STATIC size_t ZSTD_cwksp_used(const ZSTD_cwksp* ws) { + return (size_t)((BYTE*)ws->tableEnd - (BYTE*)ws->workspace) + + (size_t)((BYTE*)ws->workspaceEnd - (BYTE*)ws->allocStart); +} + /* * The provided workspace takes ownership of the buffer [start, start+size= ). * Any existing values in the workspace are ignored (the previously managed @@ -498,6 +575,7 @@ MEM_STATIC void ZSTD_cwksp_init(ZSTD_cwksp* ws, void* s= tart, size_t size, ZSTD_c ws->workspaceEnd =3D (BYTE*)start + size; ws->objectEnd =3D ws->workspace; ws->tableValidEnd =3D ws->objectEnd; + ws->initOnceStart =3D ZSTD_cwksp_initialAllocStart(ws); ws->phase =3D ZSTD_cwksp_alloc_objects; ws->isStatic =3D isStatic; ZSTD_cwksp_clear(ws); @@ -529,15 +607,6 @@ MEM_STATIC void ZSTD_cwksp_move(ZSTD_cwksp* dst, ZSTD_= cwksp* src) { ZSTD_memset(src, 0, sizeof(ZSTD_cwksp)); } =20 -MEM_STATIC size_t ZSTD_cwksp_sizeof(const ZSTD_cwksp* ws) { - return (size_t)((BYTE*)ws->workspaceEnd - (BYTE*)ws->workspace); -} - -MEM_STATIC size_t ZSTD_cwksp_used(const ZSTD_cwksp* ws) { - return (size_t)((BYTE*)ws->tableEnd - (BYTE*)ws->workspace) - + (size_t)((BYTE*)ws->workspaceEnd - (BYTE*)ws->allocStart); -} - MEM_STATIC int ZSTD_cwksp_reserve_failed(const ZSTD_cwksp* ws) { return ws->allocFailed; } @@ -550,17 +619,11 @@ MEM_STATIC int ZSTD_cwksp_reserve_failed(const ZSTD_c= wksp* ws) { * Returns if the estimated space needed for a wksp is within an acceptabl= e limit of the * actual amount of space used. */ -MEM_STATIC int ZSTD_cwksp_estimated_space_within_bounds(const ZSTD_cwksp* = const ws, - size_t const estim= atedSpace, int resizedWorkspace) { - if (resizedWorkspace) { - /* Resized/newly allocated wksp should have exact bounds */ - return ZSTD_cwksp_used(ws) =3D=3D estimatedSpace; - } else { - /* Due to alignment, when reusing a workspace, we can actually con= sume 63 fewer or more bytes - * than estimatedSpace. See the comments in zstd_cwksp.h for detai= ls. - */ - return (ZSTD_cwksp_used(ws) >=3D estimatedSpace - 63) && (ZSTD_cwk= sp_used(ws) <=3D estimatedSpace + 63); - } +MEM_STATIC int ZSTD_cwksp_estimated_space_within_bounds(const ZSTD_cwksp *= const ws, size_t const estimatedSpace) { + /* We have an alignment space between objects and tables between table= s and buffers, so we can have up to twice + * the alignment bytes difference between estimation and actual usage = */ + return (estimatedSpace - ZSTD_cwksp_slack_space_required()) <=3D ZSTD_= cwksp_used(ws) && + ZSTD_cwksp_used(ws) <=3D estimatedSpace; } =20 =20 @@ -591,5 +654,4 @@ MEM_STATIC void ZSTD_cwksp_bump_oversized_duration( } } =20 - #endif /* ZSTD_CWKSP_H */ diff --git a/lib/zstd/compress/zstd_double_fast.c b/lib/zstd/compress/zstd_= double_fast.c index 76933dea2624..995e83f3a183 100644 --- a/lib/zstd/compress/zstd_double_fast.c +++ b/lib/zstd/compress/zstd_double_fast.c @@ -1,5 +1,6 @@ +// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -11,8 +12,49 @@ #include "zstd_compress_internal.h" #include "zstd_double_fast.h" =20 +#ifndef ZSTD_EXCLUDE_DFAST_BLOCK_COMPRESSOR =20 -void ZSTD_fillDoubleHashTable(ZSTD_matchState_t* ms, +static +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR +void ZSTD_fillDoubleHashTableForCDict(ZSTD_MatchState_t* ms, + void const* end, ZSTD_dictTableLoadMethod_e = dtlm) +{ + const ZSTD_compressionParameters* const cParams =3D &ms->cParams; + U32* const hashLarge =3D ms->hashTable; + U32 const hBitsL =3D cParams->hashLog + ZSTD_SHORT_CACHE_TAG_BITS; + U32 const mls =3D cParams->minMatch; + U32* const hashSmall =3D ms->chainTable; + U32 const hBitsS =3D cParams->chainLog + ZSTD_SHORT_CACHE_TAG_BITS; + const BYTE* const base =3D ms->window.base; + const BYTE* ip =3D base + ms->nextToUpdate; + const BYTE* const iend =3D ((const BYTE*)end) - HASH_READ_SIZE; + const U32 fastHashFillStep =3D 3; + + /* Always insert every fastHashFillStep position into the hash tables. + * Insert the other positions into the large hash table if their entry + * is empty. + */ + for (; ip + fastHashFillStep - 1 <=3D iend; ip +=3D fastHashFillStep) { + U32 const curr =3D (U32)(ip - base); + U32 i; + for (i =3D 0; i < fastHashFillStep; ++i) { + size_t const smHashAndTag =3D ZSTD_hashPtr(ip + i, hBitsS, mls= ); + size_t const lgHashAndTag =3D ZSTD_hashPtr(ip + i, hBitsL, 8); + if (i =3D=3D 0) { + ZSTD_writeTaggedIndex(hashSmall, smHashAndTag, curr + i); + } + if (i =3D=3D 0 || hashLarge[lgHashAndTag >> ZSTD_SHORT_CACHE_T= AG_BITS] =3D=3D 0) { + ZSTD_writeTaggedIndex(hashLarge, lgHashAndTag, curr + i); + } + /* Only load extra positions for ZSTD_dtlm_full */ + if (dtlm =3D=3D ZSTD_dtlm_fast) + break; + } } +} + +static +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR +void ZSTD_fillDoubleHashTableForCCtx(ZSTD_MatchState_t* ms, void const* end, ZSTD_dictTableLoadMethod_e = dtlm) { const ZSTD_compressionParameters* const cParams =3D &ms->cParams; @@ -43,13 +85,26 @@ void ZSTD_fillDoubleHashTable(ZSTD_matchState_t* ms, /* Only load extra positions for ZSTD_dtlm_full */ if (dtlm =3D=3D ZSTD_dtlm_fast) break; - } } + } } +} + +void ZSTD_fillDoubleHashTable(ZSTD_MatchState_t* ms, + const void* const end, + ZSTD_dictTableLoadMethod_e dtlm, + ZSTD_tableFillPurpose_e tfp) +{ + if (tfp =3D=3D ZSTD_tfp_forCDict) { + ZSTD_fillDoubleHashTableForCDict(ms, end, dtlm); + } else { + ZSTD_fillDoubleHashTableForCCtx(ms, end, dtlm); + } } =20 =20 FORCE_INLINE_TEMPLATE +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR size_t ZSTD_compressBlock_doubleFast_noDict_generic( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize, U32 const mls /* template */) { ZSTD_compressionParameters const* cParams =3D &ms->cParams; @@ -67,7 +122,7 @@ size_t ZSTD_compressBlock_doubleFast_noDict_generic( const BYTE* const iend =3D istart + srcSize; const BYTE* const ilimit =3D iend - HASH_READ_SIZE; U32 offset_1=3Drep[0], offset_2=3Drep[1]; - U32 offsetSaved =3D 0; + U32 offsetSaved1 =3D 0, offsetSaved2 =3D 0; =20 size_t mLength; U32 offset; @@ -88,9 +143,14 @@ size_t ZSTD_compressBlock_doubleFast_noDict_generic( const BYTE* matchl0; /* the long match for ip */ const BYTE* matchs0; /* the short match for ip */ const BYTE* matchl1; /* the long match for ip1 */ + const BYTE* matchs0_safe; /* matchs0 or safe address */ =20 const BYTE* ip =3D istart; /* the current position */ const BYTE* ip1; /* the next position */ + /* Array of ~random data, should have low probability of matching data + * we load from here instead of from tables, if matchl0/matchl1 are + * invalid indices. Used to avoid unpredictable branches. */ + const BYTE dummy[] =3D {0x12,0x34,0x56,0x78,0x9a,0xbc,0xde,0xf0,0xe2,0= xb4}; =20 DEBUGLOG(5, "ZSTD_compressBlock_doubleFast_noDict_generic"); =20 @@ -100,8 +160,8 @@ size_t ZSTD_compressBlock_doubleFast_noDict_generic( U32 const current =3D (U32)(ip - base); U32 const windowLow =3D ZSTD_getLowestPrefixIndex(ms, current, cPa= rams->windowLog); U32 const maxRep =3D current - windowLow; - if (offset_2 > maxRep) offsetSaved =3D offset_2, offset_2 =3D 0; - if (offset_1 > maxRep) offsetSaved =3D offset_1, offset_1 =3D 0; + if (offset_2 > maxRep) offsetSaved2 =3D offset_2, offset_2 =3D 0; + if (offset_1 > maxRep) offsetSaved1 =3D offset_1, offset_1 =3D 0; } =20 /* Outer Loop: one iteration per match found and stored */ @@ -131,30 +191,35 @@ size_t ZSTD_compressBlock_doubleFast_noDict_generic( if ((offset_1 > 0) & (MEM_read32(ip+1-offset_1) =3D=3D MEM_rea= d32(ip+1))) { mLength =3D ZSTD_count(ip+1+4, ip+1+4-offset_1, iend) + 4; ip++; - ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend,= STORE_REPCODE_1, mLength); + ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend,= REPCODE1_TO_OFFBASE, mLength); goto _match_stored; } =20 hl1 =3D ZSTD_hashPtr(ip1, hBitsL, 8); =20 - if (idxl0 > prefixLowestIndex) { + /* idxl0 > prefixLowestIndex is a (somewhat) unpredictable bra= nch. + * However expression below complies into conditional move. Si= nce + * match is unlikely and we only *branch* on idxl0 > prefixLow= estIndex + * if there is a match, all branches become predictable. */ + { const BYTE* const matchl0_safe =3D ZSTD_selectAddr(idxl0,= prefixLowestIndex, matchl0, &dummy[0]); + /* check prefix long match */ - if (MEM_read64(matchl0) =3D=3D MEM_read64(ip)) { + if (MEM_read64(matchl0_safe) =3D=3D MEM_read64(ip) && matc= hl0_safe =3D=3D matchl0) { mLength =3D ZSTD_count(ip+8, matchl0+8, iend) + 8; offset =3D (U32)(ip-matchl0); while (((ip>anchor) & (matchl0>prefixLowest)) && (ip[-= 1] =3D=3D matchl0[-1])) { ip--; matchl0--; mLength++; } /* catch up */ goto _match_found; - } - } + } } =20 idxl1 =3D hashLong[hl1]; matchl1 =3D base + idxl1; =20 - if (idxs0 > prefixLowestIndex) { - /* check prefix short match */ - if (MEM_read32(matchs0) =3D=3D MEM_read32(ip)) { - goto _search_next_long; - } + /* Same optimization as matchl0 above */ + matchs0_safe =3D ZSTD_selectAddr(idxs0, prefixLowestIndex, mat= chs0, &dummy[0]); + + /* check prefix short match */ + if(MEM_read32(matchs0_safe) =3D=3D MEM_read32(ip) && matchs0_s= afe =3D=3D matchs0) { + goto _search_next_long; } =20 if (ip1 >=3D nextStep) { @@ -175,30 +240,36 @@ size_t ZSTD_compressBlock_doubleFast_noDict_generic( } while (ip1 <=3D ilimit); =20 _cleanup: + /* If offset_1 started invalid (offsetSaved1 !=3D 0) and became va= lid (offset_1 !=3D 0), + * rotate saved offsets. See comment in ZSTD_compressBlock_fast_no= Dict for more context. */ + offsetSaved2 =3D ((offsetSaved1 !=3D 0) && (offset_1 !=3D 0)) ? of= fsetSaved1 : offsetSaved2; + /* save reps for next block */ - rep[0] =3D offset_1 ? offset_1 : offsetSaved; - rep[1] =3D offset_2 ? offset_2 : offsetSaved; + rep[0] =3D offset_1 ? offset_1 : offsetSaved1; + rep[1] =3D offset_2 ? offset_2 : offsetSaved2; =20 /* Return the last literals size */ return (size_t)(iend - anchor); =20 _search_next_long: =20 - /* check prefix long +1 match */ - if (idxl1 > prefixLowestIndex) { - if (MEM_read64(matchl1) =3D=3D MEM_read64(ip1)) { + /* short match found: let's check for a longer one */ + mLength =3D ZSTD_count(ip+4, matchs0+4, iend) + 4; + offset =3D (U32)(ip - matchs0); + + /* check long match at +1 position */ + if ((idxl1 > prefixLowestIndex) && (MEM_read64(matchl1) =3D=3D MEM= _read64(ip1))) { + size_t const l1len =3D ZSTD_count(ip1+8, matchl1+8, iend) + 8; + if (l1len > mLength) { + /* use the long match instead */ ip =3D ip1; - mLength =3D ZSTD_count(ip+8, matchl1+8, iend) + 8; + mLength =3D l1len; offset =3D (U32)(ip-matchl1); - while (((ip>anchor) & (matchl1>prefixLowest)) && (ip[-1] = =3D=3D matchl1[-1])) { ip--; matchl1--; mLength++; } /* catch up */ - goto _match_found; + matchs0 =3D matchl1; } } =20 - /* if no long +1 match, explore the short match we found */ - mLength =3D ZSTD_count(ip+4, matchs0+4, iend) + 4; - offset =3D (U32)(ip - matchs0); - while (((ip>anchor) & (matchs0>prefixLowest)) && (ip[-1] =3D=3D ma= tchs0[-1])) { ip--; matchs0--; mLength++; } /* catch up */ + while (((ip>anchor) & (matchs0>prefixLowest)) && (ip[-1] =3D=3D ma= tchs0[-1])) { ip--; matchs0--; mLength++; } /* complete backward */ =20 /* fall-through */ =20 @@ -217,7 +288,7 @@ size_t ZSTD_compressBlock_doubleFast_noDict_generic( hashLong[hl1] =3D (U32)(ip1 - base); } =20 - ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend, STORE_O= FFSET(offset), mLength); + ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend, OFFSET_= TO_OFFBASE(offset), mLength); =20 _match_stored: /* match found */ @@ -243,7 +314,7 @@ size_t ZSTD_compressBlock_doubleFast_noDict_generic( U32 const tmpOff =3D offset_2; offset_2 =3D offset_1; offs= et_1 =3D tmpOff; /* swap offset_2 <=3D> offset_1 */ hashSmall[ZSTD_hashPtr(ip, hBitsS, mls)] =3D (U32)(ip-base= ); hashLong[ZSTD_hashPtr(ip, hBitsL, 8)] =3D (U32)(ip-base); - ZSTD_storeSeq(seqStore, 0, anchor, iend, STORE_REPCODE_1, = rLength); + ZSTD_storeSeq(seqStore, 0, anchor, iend, REPCODE1_TO_OFFBA= SE, rLength); ip +=3D rLength; anchor =3D ip; continue; /* faster when present ... (?) */ @@ -254,8 +325,9 @@ size_t ZSTD_compressBlock_doubleFast_noDict_generic( =20 =20 FORCE_INLINE_TEMPLATE +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR size_t ZSTD_compressBlock_doubleFast_dictMatchState_generic( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize, U32 const mls /* template */) { @@ -275,9 +347,8 @@ size_t ZSTD_compressBlock_doubleFast_dictMatchState_gen= eric( const BYTE* const iend =3D istart + srcSize; const BYTE* const ilimit =3D iend - HASH_READ_SIZE; U32 offset_1=3Drep[0], offset_2=3Drep[1]; - U32 offsetSaved =3D 0; =20 - const ZSTD_matchState_t* const dms =3D ms->dictMatchState; + const ZSTD_MatchState_t* const dms =3D ms->dictMatchState; const ZSTD_compressionParameters* const dictCParams =3D &dms->cParams; const U32* const dictHashLong =3D dms->hashTable; const U32* const dictHashSmall =3D dms->chainTable; @@ -286,8 +357,8 @@ size_t ZSTD_compressBlock_doubleFast_dictMatchState_gen= eric( const BYTE* const dictStart =3D dictBase + dictStartIndex; const BYTE* const dictEnd =3D dms->window.nextSrc; const U32 dictIndexDelta =3D prefixLowestIndex - (U32)(dictEnd -= dictBase); - const U32 dictHBitsL =3D dictCParams->hashLog; - const U32 dictHBitsS =3D dictCParams->chainLog; + const U32 dictHBitsL =3D dictCParams->hashLog + ZSTD_SHORT_C= ACHE_TAG_BITS; + const U32 dictHBitsS =3D dictCParams->chainLog + ZSTD_SHORT_= CACHE_TAG_BITS; const U32 dictAndPrefixLength =3D (U32)((ip - prefixLowest) + (dictEn= d - dictStart)); =20 DEBUGLOG(5, "ZSTD_compressBlock_doubleFast_dictMatchState_generic"); @@ -295,6 +366,13 @@ size_t ZSTD_compressBlock_doubleFast_dictMatchState_ge= neric( /* if a dictionary is attached, it must be within window range */ assert(ms->window.dictLimit + (1U << cParams->windowLog) >=3D endIndex= ); =20 + if (ms->prefetchCDictTables) { + size_t const hashTableBytes =3D (((size_t)1) << dictCParams->hashL= og) * sizeof(U32); + size_t const chainTableBytes =3D (((size_t)1) << dictCParams->chai= nLog) * sizeof(U32); + PREFETCH_AREA(dictHashLong, hashTableBytes); + PREFETCH_AREA(dictHashSmall, chainTableBytes); + } + /* init */ ip +=3D (dictAndPrefixLength =3D=3D 0); =20 @@ -309,8 +387,12 @@ size_t ZSTD_compressBlock_doubleFast_dictMatchState_ge= neric( U32 offset; size_t const h2 =3D ZSTD_hashPtr(ip, hBitsL, 8); size_t const h =3D ZSTD_hashPtr(ip, hBitsS, mls); - size_t const dictHL =3D ZSTD_hashPtr(ip, dictHBitsL, 8); - size_t const dictHS =3D ZSTD_hashPtr(ip, dictHBitsS, mls); + size_t const dictHashAndTagL =3D ZSTD_hashPtr(ip, dictHBitsL, 8); + size_t const dictHashAndTagS =3D ZSTD_hashPtr(ip, dictHBitsS, mls); + U32 const dictMatchIndexAndTagL =3D dictHashLong[dictHashAndTagL >= > ZSTD_SHORT_CACHE_TAG_BITS]; + U32 const dictMatchIndexAndTagS =3D dictHashSmall[dictHashAndTagS = >> ZSTD_SHORT_CACHE_TAG_BITS]; + int const dictTagsMatchL =3D ZSTD_comparePackedTags(dictMatchIndex= AndTagL, dictHashAndTagL); + int const dictTagsMatchS =3D ZSTD_comparePackedTags(dictMatchIndex= AndTagS, dictHashAndTagS); U32 const curr =3D (U32)(ip-base); U32 const matchIndexL =3D hashLong[h2]; U32 matchIndexS =3D hashSmall[h]; @@ -323,26 +405,24 @@ size_t ZSTD_compressBlock_doubleFast_dictMatchState_g= eneric( hashLong[h2] =3D hashSmall[h] =3D curr; /* update hash tables */ =20 /* check repcode */ - if (((U32)((prefixLowestIndex-1) - repIndex) >=3D 3 /* intentional= underflow */) + if ((ZSTD_index_overlap_check(prefixLowestIndex, repIndex)) && (MEM_read32(repMatch) =3D=3D MEM_read32(ip+1)) ) { const BYTE* repMatchEnd =3D repIndex < prefixLowestIndex ? dic= tEnd : iend; mLength =3D ZSTD_count_2segments(ip+1+4, repMatch+4, iend, rep= MatchEnd, prefixLowest) + 4; ip++; - ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend, STO= RE_REPCODE_1, mLength); + ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend, REP= CODE1_TO_OFFBASE, mLength); goto _match_stored; } =20 - if (matchIndexL > prefixLowestIndex) { + if ((matchIndexL >=3D prefixLowestIndex) && (MEM_read64(matchLong)= =3D=3D MEM_read64(ip))) { /* check prefix long match */ - if (MEM_read64(matchLong) =3D=3D MEM_read64(ip)) { - mLength =3D ZSTD_count(ip+8, matchLong+8, iend) + 8; - offset =3D (U32)(ip-matchLong); - while (((ip>anchor) & (matchLong>prefixLowest)) && (ip[-1]= =3D=3D matchLong[-1])) { ip--; matchLong--; mLength++; } /* catch up */ - goto _match_found; - } - } else { + mLength =3D ZSTD_count(ip+8, matchLong+8, iend) + 8; + offset =3D (U32)(ip-matchLong); + while (((ip>anchor) & (matchLong>prefixLowest)) && (ip[-1] =3D= =3D matchLong[-1])) { ip--; matchLong--; mLength++; } /* catch up */ + goto _match_found; + } else if (dictTagsMatchL) { /* check dictMatchState long match */ - U32 const dictMatchIndexL =3D dictHashLong[dictHL]; + U32 const dictMatchIndexL =3D dictMatchIndexAndTagL >> ZSTD_SH= ORT_CACHE_TAG_BITS; const BYTE* dictMatchL =3D dictBase + dictMatchIndexL; assert(dictMatchL < dictEnd); =20 @@ -354,13 +434,13 @@ size_t ZSTD_compressBlock_doubleFast_dictMatchState_g= eneric( } } =20 if (matchIndexS > prefixLowestIndex) { - /* check prefix short match */ + /* short match candidate */ if (MEM_read32(match) =3D=3D MEM_read32(ip)) { goto _search_next_long; } - } else { + } else if (dictTagsMatchS) { /* check dictMatchState short match */ - U32 const dictMatchIndexS =3D dictHashSmall[dictHS]; + U32 const dictMatchIndexS =3D dictMatchIndexAndTagS >> ZSTD_SH= ORT_CACHE_TAG_BITS; match =3D dictBase + dictMatchIndexS; matchIndexS =3D dictMatchIndexS + dictIndexDelta; =20 @@ -375,25 +455,24 @@ size_t ZSTD_compressBlock_doubleFast_dictMatchState_g= eneric( continue; =20 _search_next_long: - { size_t const hl3 =3D ZSTD_hashPtr(ip+1, hBitsL, 8); - size_t const dictHLNext =3D ZSTD_hashPtr(ip+1, dictHBitsL, 8); + size_t const dictHashAndTagL3 =3D ZSTD_hashPtr(ip+1, dictHBits= L, 8); U32 const matchIndexL3 =3D hashLong[hl3]; + U32 const dictMatchIndexAndTagL3 =3D dictHashLong[dictHashAndT= agL3 >> ZSTD_SHORT_CACHE_TAG_BITS]; + int const dictTagsMatchL3 =3D ZSTD_comparePackedTags(dictMatch= IndexAndTagL3, dictHashAndTagL3); const BYTE* matchL3 =3D base + matchIndexL3; hashLong[hl3] =3D curr + 1; =20 /* check prefix long +1 match */ - if (matchIndexL3 > prefixLowestIndex) { - if (MEM_read64(matchL3) =3D=3D MEM_read64(ip+1)) { - mLength =3D ZSTD_count(ip+9, matchL3+8, iend) + 8; - ip++; - offset =3D (U32)(ip-matchL3); - while (((ip>anchor) & (matchL3>prefixLowest)) && (ip[-= 1] =3D=3D matchL3[-1])) { ip--; matchL3--; mLength++; } /* catch up */ - goto _match_found; - } - } else { + if ((matchIndexL3 >=3D prefixLowestIndex) && (MEM_read64(match= L3) =3D=3D MEM_read64(ip+1))) { + mLength =3D ZSTD_count(ip+9, matchL3+8, iend) + 8; + ip++; + offset =3D (U32)(ip-matchL3); + while (((ip>anchor) & (matchL3>prefixLowest)) && (ip[-1] = =3D=3D matchL3[-1])) { ip--; matchL3--; mLength++; } /* catch up */ + goto _match_found; + } else if (dictTagsMatchL3) { /* check dict long +1 match */ - U32 const dictMatchIndexL3 =3D dictHashLong[dictHLNext]; + U32 const dictMatchIndexL3 =3D dictMatchIndexAndTagL3 >> Z= STD_SHORT_CACHE_TAG_BITS; const BYTE* dictMatchL3 =3D dictBase + dictMatchIndexL3; assert(dictMatchL3 < dictEnd); if (dictMatchL3 > dictStart && MEM_read64(dictMatchL3) =3D= =3D MEM_read64(ip+1)) { @@ -419,7 +498,7 @@ size_t ZSTD_compressBlock_doubleFast_dictMatchState_gen= eric( offset_2 =3D offset_1; offset_1 =3D offset; =20 - ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend, STORE_O= FFSET(offset), mLength); + ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend, OFFSET_= TO_OFFBASE(offset), mLength); =20 _match_stored: /* match found */ @@ -443,12 +522,12 @@ size_t ZSTD_compressBlock_doubleFast_dictMatchState_g= eneric( const BYTE* repMatch2 =3D repIndex2 < prefixLowestIndex ? dictBase + repIndex2 - dictIndexDelta : base + repIndex2; - if ( ((U32)((prefixLowestIndex-1) - (U32)repIndex2) >=3D 3= /* intentional overflow */) + if ( (ZSTD_index_overlap_check(prefixLowestIndex, repIndex= 2)) && (MEM_read32(repMatch2) =3D=3D MEM_read32(ip)) ) { const BYTE* const repEnd2 =3D repIndex2 < prefixLowest= Index ? dictEnd : iend; size_t const repLength2 =3D ZSTD_count_2segments(ip+4,= repMatch2+4, iend, repEnd2, prefixLowest) + 4; U32 tmpOffset =3D offset_2; offset_2 =3D offset_1; off= set_1 =3D tmpOffset; /* swap offset_2 <=3D> offset_1 */ - ZSTD_storeSeq(seqStore, 0, anchor, iend, STORE_REPCODE= _1, repLength2); + ZSTD_storeSeq(seqStore, 0, anchor, iend, REPCODE1_TO_O= FFBASE, repLength2); hashSmall[ZSTD_hashPtr(ip, hBitsS, mls)] =3D current2; hashLong[ZSTD_hashPtr(ip, hBitsL, 8)] =3D current2; ip +=3D repLength2; @@ -461,8 +540,8 @@ size_t ZSTD_compressBlock_doubleFast_dictMatchState_gen= eric( } /* while (ip < ilimit) */ =20 /* save reps for next block */ - rep[0] =3D offset_1 ? offset_1 : offsetSaved; - rep[1] =3D offset_2 ? offset_2 : offsetSaved; + rep[0] =3D offset_1; + rep[1] =3D offset_2; =20 /* Return the last literals size */ return (size_t)(iend - anchor); @@ -470,7 +549,7 @@ size_t ZSTD_compressBlock_doubleFast_dictMatchState_gen= eric( =20 #define ZSTD_GEN_DFAST_FN(dictMode, mls) = \ static size_t ZSTD_compressBlock_doubleFast_##dictMode##_##mls( = \ - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_= NUM], \ + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_= NUM], \ void const* src, size_t srcSize) = \ { = \ return ZSTD_compressBlock_doubleFast_##dictMode##_generic(ms, seqS= tore, rep, src, srcSize, mls); \ @@ -488,7 +567,7 @@ ZSTD_GEN_DFAST_FN(dictMatchState, 7) =20 =20 size_t ZSTD_compressBlock_doubleFast( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize) { const U32 mls =3D ms->cParams.minMatch; @@ -508,7 +587,7 @@ size_t ZSTD_compressBlock_doubleFast( =20 =20 size_t ZSTD_compressBlock_doubleFast_dictMatchState( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize) { const U32 mls =3D ms->cParams.minMatch; @@ -527,8 +606,10 @@ size_t ZSTD_compressBlock_doubleFast_dictMatchState( } =20 =20 -static size_t ZSTD_compressBlock_doubleFast_extDict_generic( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +static +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR +size_t ZSTD_compressBlock_doubleFast_extDict_generic( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize, U32 const mls /* template */) { @@ -579,13 +660,13 @@ static size_t ZSTD_compressBlock_doubleFast_extDict_g= eneric( size_t mLength; hashSmall[hSmall] =3D hashLong[hLong] =3D curr; /* update hash t= able */ =20 - if ((((U32)((prefixStartIndex-1) - repIndex) >=3D 3) /* intentiona= l underflow : ensure repIndex doesn't overlap dict + prefix */ + if (((ZSTD_index_overlap_check(prefixStartIndex, repIndex)) & (offset_1 <=3D curr+1 - dictStartIndex)) /* note: we are sea= rching at curr+1 */ && (MEM_read32(repMatch) =3D=3D MEM_read32(ip+1)) ) { const BYTE* repMatchEnd =3D repIndex < prefixStartIndex ? dict= End : iend; mLength =3D ZSTD_count_2segments(ip+1+4, repMatch+4, iend, rep= MatchEnd, prefixStart) + 4; ip++; - ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend, STO= RE_REPCODE_1, mLength); + ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend, REP= CODE1_TO_OFFBASE, mLength); } else { if ((matchLongIndex > dictStartIndex) && (MEM_read64(matchLong= ) =3D=3D MEM_read64(ip))) { const BYTE* const matchEnd =3D matchLongIndex < prefixStar= tIndex ? dictEnd : iend; @@ -596,7 +677,7 @@ static size_t ZSTD_compressBlock_doubleFast_extDict_gen= eric( while (((ip>anchor) & (matchLong>lowMatchPtr)) && (ip[-1] = =3D=3D matchLong[-1])) { ip--; matchLong--; mLength++; } /* catch up */ offset_2 =3D offset_1; offset_1 =3D offset; - ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend,= STORE_OFFSET(offset), mLength); + ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend,= OFFSET_TO_OFFBASE(offset), mLength); =20 } else if ((matchIndex > dictStartIndex) && (MEM_read32(match)= =3D=3D MEM_read32(ip))) { size_t const h3 =3D ZSTD_hashPtr(ip+1, hBitsL, 8); @@ -621,7 +702,7 @@ static size_t ZSTD_compressBlock_doubleFast_extDict_gen= eric( } offset_2 =3D offset_1; offset_1 =3D offset; - ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend,= STORE_OFFSET(offset), mLength); + ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend,= OFFSET_TO_OFFBASE(offset), mLength); =20 } else { ip +=3D ((ip-anchor) >> kSearchStrength) + 1; @@ -647,13 +728,13 @@ static size_t ZSTD_compressBlock_doubleFast_extDict_g= eneric( U32 const current2 =3D (U32)(ip-base); U32 const repIndex2 =3D current2 - offset_2; const BYTE* repMatch2 =3D repIndex2 < prefixStartIndex ? d= ictBase + repIndex2 : base + repIndex2; - if ( (((U32)((prefixStartIndex-1) - repIndex2) >=3D 3) /= * intentional overflow : ensure repIndex2 doesn't overlap dict + prefix */ + if ( ((ZSTD_index_overlap_check(prefixStartIndex, repIndex= 2)) & (offset_2 <=3D current2 - dictStartIndex)) && (MEM_read32(repMatch2) =3D=3D MEM_read32(ip)) ) { const BYTE* const repEnd2 =3D repIndex2 < prefixStartI= ndex ? dictEnd : iend; size_t const repLength2 =3D ZSTD_count_2segments(ip+4,= repMatch2+4, iend, repEnd2, prefixStart) + 4; U32 const tmpOffset =3D offset_2; offset_2 =3D offset_= 1; offset_1 =3D tmpOffset; /* swap offset_2 <=3D> offset_1 */ - ZSTD_storeSeq(seqStore, 0, anchor, iend, STORE_REPCODE= _1, repLength2); + ZSTD_storeSeq(seqStore, 0, anchor, iend, REPCODE1_TO_O= FFBASE, repLength2); hashSmall[ZSTD_hashPtr(ip, hBitsS, mls)] =3D current2; hashLong[ZSTD_hashPtr(ip, hBitsL, 8)] =3D current2; ip +=3D repLength2; @@ -677,7 +758,7 @@ ZSTD_GEN_DFAST_FN(extDict, 6) ZSTD_GEN_DFAST_FN(extDict, 7) =20 size_t ZSTD_compressBlock_doubleFast_extDict( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize) { U32 const mls =3D ms->cParams.minMatch; @@ -694,3 +775,5 @@ size_t ZSTD_compressBlock_doubleFast_extDict( return ZSTD_compressBlock_doubleFast_extDict_7(ms, seqStore, rep, = src, srcSize); } } + +#endif /* ZSTD_EXCLUDE_DFAST_BLOCK_COMPRESSOR */ diff --git a/lib/zstd/compress/zstd_double_fast.h b/lib/zstd/compress/zstd_= double_fast.h index 6822bde65a1d..011556ce56f7 100644 --- a/lib/zstd/compress/zstd_double_fast.h +++ b/lib/zstd/compress/zstd_double_fast.h @@ -1,5 +1,6 @@ +/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */ /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -11,22 +12,32 @@ #ifndef ZSTD_DOUBLE_FAST_H #define ZSTD_DOUBLE_FAST_H =20 - #include "../common/mem.h" /* U32 */ #include "zstd_compress_internal.h" /* ZSTD_CCtx, size_t */ =20 -void ZSTD_fillDoubleHashTable(ZSTD_matchState_t* ms, - void const* end, ZSTD_dictTableLoadMethod_e = dtlm); +#ifndef ZSTD_EXCLUDE_DFAST_BLOCK_COMPRESSOR + +void ZSTD_fillDoubleHashTable(ZSTD_MatchState_t* ms, + void const* end, ZSTD_dictTableLoadMethod_e = dtlm, + ZSTD_tableFillPurpose_e tfp); + size_t ZSTD_compressBlock_doubleFast( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); size_t ZSTD_compressBlock_doubleFast_dictMatchState( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); size_t ZSTD_compressBlock_doubleFast_extDict( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); =20 - +#define ZSTD_COMPRESSBLOCK_DOUBLEFAST ZSTD_compressBlock_doubleFast +#define ZSTD_COMPRESSBLOCK_DOUBLEFAST_DICTMATCHSTATE ZSTD_compressBlock_do= ubleFast_dictMatchState +#define ZSTD_COMPRESSBLOCK_DOUBLEFAST_EXTDICT ZSTD_compressBlock_doubleFas= t_extDict +#else +#define ZSTD_COMPRESSBLOCK_DOUBLEFAST NULL +#define ZSTD_COMPRESSBLOCK_DOUBLEFAST_DICTMATCHSTATE NULL +#define ZSTD_COMPRESSBLOCK_DOUBLEFAST_EXTDICT NULL +#endif /* ZSTD_EXCLUDE_DFAST_BLOCK_COMPRESSOR */ =20 #endif /* ZSTD_DOUBLE_FAST_H */ diff --git a/lib/zstd/compress/zstd_fast.c b/lib/zstd/compress/zstd_fast.c index a752e6beab52..60e07e839e5f 100644 --- a/lib/zstd/compress/zstd_fast.c +++ b/lib/zstd/compress/zstd_fast.c @@ -1,5 +1,6 @@ +// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -11,8 +12,46 @@ #include "zstd_compress_internal.h" /* ZSTD_hashPtr, ZSTD_count, ZSTD_sto= reSeq */ #include "zstd_fast.h" =20 +static +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR +void ZSTD_fillHashTableForCDict(ZSTD_MatchState_t* ms, + const void* const end, + ZSTD_dictTableLoadMethod_e dtlm) +{ + const ZSTD_compressionParameters* const cParams =3D &ms->cParams; + U32* const hashTable =3D ms->hashTable; + U32 const hBits =3D cParams->hashLog + ZSTD_SHORT_CACHE_TAG_BITS; + U32 const mls =3D cParams->minMatch; + const BYTE* const base =3D ms->window.base; + const BYTE* ip =3D base + ms->nextToUpdate; + const BYTE* const iend =3D ((const BYTE*)end) - HASH_READ_SIZE; + const U32 fastHashFillStep =3D 3; + + /* Currently, we always use ZSTD_dtlm_full for filling CDict tables. + * Feel free to remove this assert if there's a good reason! */ + assert(dtlm =3D=3D ZSTD_dtlm_full); + + /* Always insert every fastHashFillStep position into the hash table. + * Insert the other positions if their hash entry is empty. + */ + for ( ; ip + fastHashFillStep < iend + 2; ip +=3D fastHashFillStep) { + U32 const curr =3D (U32)(ip - base); + { size_t const hashAndTag =3D ZSTD_hashPtr(ip, hBits, mls); + ZSTD_writeTaggedIndex(hashTable, hashAndTag, curr); } + + if (dtlm =3D=3D ZSTD_dtlm_fast) continue; + /* Only load extra positions for ZSTD_dtlm_full */ + { U32 p; + for (p =3D 1; p < fastHashFillStep; ++p) { + size_t const hashAndTag =3D ZSTD_hashPtr(ip + p, hBits, ml= s); + if (hashTable[hashAndTag >> ZSTD_SHORT_CACHE_TAG_BITS] =3D= =3D 0) { /* not yet filled */ + ZSTD_writeTaggedIndex(hashTable, hashAndTag, curr + p); + } } } } +} =20 -void ZSTD_fillHashTable(ZSTD_matchState_t* ms, +static +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR +void ZSTD_fillHashTableForCCtx(ZSTD_MatchState_t* ms, const void* const end, ZSTD_dictTableLoadMethod_e dtlm) { @@ -25,6 +64,10 @@ void ZSTD_fillHashTable(ZSTD_matchState_t* ms, const BYTE* const iend =3D ((const BYTE*)end) - HASH_READ_SIZE; const U32 fastHashFillStep =3D 3; =20 + /* Currently, we always use ZSTD_dtlm_fast for filling CCtx tables. + * Feel free to remove this assert if there's a good reason! */ + assert(dtlm =3D=3D ZSTD_dtlm_fast); + /* Always insert every fastHashFillStep position into the hash table. * Insert the other positions if their hash entry is empty. */ @@ -42,6 +85,60 @@ void ZSTD_fillHashTable(ZSTD_matchState_t* ms, } } } } } =20 +void ZSTD_fillHashTable(ZSTD_MatchState_t* ms, + const void* const end, + ZSTD_dictTableLoadMethod_e dtlm, + ZSTD_tableFillPurpose_e tfp) +{ + if (tfp =3D=3D ZSTD_tfp_forCDict) { + ZSTD_fillHashTableForCDict(ms, end, dtlm); + } else { + ZSTD_fillHashTableForCCtx(ms, end, dtlm); + } +} + + +typedef int (*ZSTD_match4Found) (const BYTE* currentPtr, const BYTE* match= Address, U32 matchIdx, U32 idxLowLimit); + +static int +ZSTD_match4Found_cmov(const BYTE* currentPtr, const BYTE* matchAddress, U3= 2 matchIdx, U32 idxLowLimit) +{ + /* Array of ~random data, should have low probability of matching data. + * Load from here if the index is invalid. + * Used to avoid unpredictable branches. */ + static const BYTE dummy[] =3D {0x12,0x34,0x56,0x78}; + + /* currentIdx >=3D lowLimit is a (somewhat) unpredictable branch. + * However expression below compiles into conditional move. + */ + const BYTE* mvalAddr =3D ZSTD_selectAddr(matchIdx, idxLowLimit, matchA= ddress, dummy); + /* Note: this used to be written as : return test1 && test2; + * Unfortunately, once inlined, these tests become branches, + * in which case it becomes critical that they are executed in the rig= ht order (test1 then test2). + * So we have to write these tests in a specific manner to ensure thei= r ordering. + */ + if (MEM_read32(currentPtr) !=3D MEM_read32(mvalAddr)) return 0; + /* force ordering of these tests, which matters once the function is i= nlined, as they become branches */ + __asm__(""); + return matchIdx >=3D idxLowLimit; +} + +static int +ZSTD_match4Found_branch(const BYTE* currentPtr, const BYTE* matchAddress, = U32 matchIdx, U32 idxLowLimit) +{ + /* using a branch instead of a cmov, + * because it's faster in scenarios where matchIdx >=3D idxLowLimit is= generally true, + * aka almost all candidates are within range */ + U32 mval; + if (matchIdx >=3D idxLowLimit) { + mval =3D MEM_read32(matchAddress); + } else { + mval =3D MEM_read32(currentPtr) ^ 1; /* guaranteed to not match. */ + } + + return (MEM_read32(currentPtr) =3D=3D mval); +} + =20 /* * If you squint hard enough (and ignore repcodes), the search operation a= t any @@ -89,17 +186,17 @@ void ZSTD_fillHashTable(ZSTD_matchState_t* ms, * * This is also the work we do at the beginning to enter the loop initiall= y. */ -FORCE_INLINE_TEMPLATE size_t -ZSTD_compressBlock_fast_noDict_generic( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +FORCE_INLINE_TEMPLATE +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR +size_t ZSTD_compressBlock_fast_noDict_generic( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize, - U32 const mls, U32 const hasStep) + U32 const mls, int useCmov) { const ZSTD_compressionParameters* const cParams =3D &ms->cParams; U32* const hashTable =3D ms->hashTable; U32 const hlog =3D cParams->hashLog; - /* support stepSize of 0 */ - size_t const stepSize =3D hasStep ? (cParams->targetLength + !(cParams= ->targetLength) + 1) : 2; + size_t const stepSize =3D cParams->targetLength + !(cParams->targetLen= gth) + 1; /* min 2 */ const BYTE* const base =3D ms->window.base; const BYTE* const istart =3D (const BYTE*)src; const U32 endIndex =3D (U32)((size_t)(istart - base) + srcSize); @@ -117,12 +214,11 @@ ZSTD_compressBlock_fast_noDict_generic( =20 U32 rep_offset1 =3D rep[0]; U32 rep_offset2 =3D rep[1]; - U32 offsetSaved =3D 0; + U32 offsetSaved1 =3D 0, offsetSaved2 =3D 0; =20 size_t hash0; /* hash for ip0 */ size_t hash1; /* hash for ip1 */ - U32 idx; /* match idx for ip0 */ - U32 mval; /* src value at match idx */ + U32 matchIdx; /* match idx for ip0 */ =20 U32 offcode; const BYTE* match0; @@ -135,14 +231,15 @@ ZSTD_compressBlock_fast_noDict_generic( size_t step; const BYTE* nextStep; const size_t kStepIncr =3D (1 << (kSearchStrength - 1)); + const ZSTD_match4Found matchFound =3D useCmov ? ZSTD_match4Found_cmov = : ZSTD_match4Found_branch; =20 DEBUGLOG(5, "ZSTD_compressBlock_fast_generic"); ip0 +=3D (ip0 =3D=3D prefixStart); { U32 const curr =3D (U32)(ip0 - base); U32 const windowLow =3D ZSTD_getLowestPrefixIndex(ms, curr, cParam= s->windowLog); U32 const maxRep =3D curr - windowLow; - if (rep_offset2 > maxRep) offsetSaved =3D rep_offset2, rep_offset2= =3D 0; - if (rep_offset1 > maxRep) offsetSaved =3D rep_offset1, rep_offset1= =3D 0; + if (rep_offset2 > maxRep) offsetSaved2 =3D rep_offset2, rep_offset= 2 =3D 0; + if (rep_offset1 > maxRep) offsetSaved1 =3D rep_offset1, rep_offset= 1 =3D 0; } =20 /* start each op */ @@ -163,7 +260,7 @@ ZSTD_compressBlock_fast_noDict_generic( hash0 =3D ZSTD_hashPtr(ip0, hlog, mls); hash1 =3D ZSTD_hashPtr(ip1, hlog, mls); =20 - idx =3D hashTable[hash0]; + matchIdx =3D hashTable[hash0]; =20 do { /* load repcode match for ip[2]*/ @@ -180,26 +277,28 @@ ZSTD_compressBlock_fast_noDict_generic( mLength =3D ip0[-1] =3D=3D match0[-1]; ip0 -=3D mLength; match0 -=3D mLength; - offcode =3D STORE_REPCODE_1; + offcode =3D REPCODE1_TO_OFFBASE; mLength +=3D 4; + + /* Write next hash table entry: it's already calculated. + * This write is known to be safe because ip1 is before the + * repcode (ip2). */ + hashTable[hash1] =3D (U32)(ip1 - base); + goto _match; } =20 - /* load match for ip[0] */ - if (idx >=3D prefixStartIndex) { - mval =3D MEM_read32(base + idx); - } else { - mval =3D MEM_read32(ip0) ^ 1; /* guaranteed to not match. */ - } + if (matchFound(ip0, base + matchIdx, matchIdx, prefixStartIndex))= { + /* Write next hash table entry (it's already calculated). + * This write is known to be safe because the ip1 =3D=3D ip0 + = 1, + * so searching will resume after ip1 */ + hashTable[hash1] =3D (U32)(ip1 - base); =20 - /* check match at ip[0] */ - if (MEM_read32(ip0) =3D=3D mval) { - /* found a match! */ goto _offset; } =20 /* lookup ip[1] */ - idx =3D hashTable[hash1]; + matchIdx =3D hashTable[hash1]; =20 /* hash ip[2] */ hash0 =3D hash1; @@ -214,21 +313,19 @@ ZSTD_compressBlock_fast_noDict_generic( current0 =3D (U32)(ip0 - base); hashTable[hash0] =3D current0; =20 - /* load match for ip[0] */ - if (idx >=3D prefixStartIndex) { - mval =3D MEM_read32(base + idx); - } else { - mval =3D MEM_read32(ip0) ^ 1; /* guaranteed to not match. */ - } - - /* check match at ip[0] */ - if (MEM_read32(ip0) =3D=3D mval) { - /* found a match! */ + if (matchFound(ip0, base + matchIdx, matchIdx, prefixStartIndex))= { + /* Write next hash table entry, since it's already calculated = */ + if (step <=3D 4) { + /* Avoid writing an index if it's >=3D position where sear= ch will resume. + * The minimum possible match has length 4, so search can r= esume at ip0 + 4. + */ + hashTable[hash1] =3D (U32)(ip1 - base); + } goto _offset; } =20 /* lookup ip[1] */ - idx =3D hashTable[hash1]; + matchIdx =3D hashTable[hash1]; =20 /* hash ip[2] */ hash0 =3D hash1; @@ -250,13 +347,28 @@ ZSTD_compressBlock_fast_noDict_generic( } while (ip3 < ilimit); =20 _cleanup: - /* Note that there are probably still a couple positions we could sear= ch. + /* Note that there are probably still a couple positions one could sea= rch. * However, it seems to be a meaningful performance hit to try to sear= ch * them. So let's not. */ =20 + /* When the repcodes are outside of the prefix, we set them to zero be= fore the loop. + * When the offsets are still zero, we need to restore them after the = block to have a correct + * repcode history. If only one offset was invalid, it is easy. The tr= icky case is when both + * offsets were invalid. We need to figure out which offset to refill = with. + * - If both offsets are zero they are in the same order. + * - If both offsets are non-zero, we won't restore the offsets fr= om `offsetSaved[12]`. + * - If only one is zero, we need to decide which offset to restor= e. + * - If rep_offset1 is non-zero, then rep_offset2 must be offs= etSaved1. + * - It is impossible for rep_offset2 to be non-zero. + * + * So if rep_offset1 started invalid (offsetSaved1 !=3D 0) and became = valid (rep_offset1 !=3D 0), then + * set rep[0] =3D rep_offset1 and rep[1] =3D offsetSaved1. + */ + offsetSaved2 =3D ((offsetSaved1 !=3D 0) && (rep_offset1 !=3D 0)) ? off= setSaved1 : offsetSaved2; + /* save reps for next block */ - rep[0] =3D rep_offset1 ? rep_offset1 : offsetSaved; - rep[1] =3D rep_offset2 ? rep_offset2 : offsetSaved; + rep[0] =3D rep_offset1 ? rep_offset1 : offsetSaved1; + rep[1] =3D rep_offset2 ? rep_offset2 : offsetSaved2; =20 /* Return the last literals size */ return (size_t)(iend - anchor); @@ -264,10 +376,10 @@ ZSTD_compressBlock_fast_noDict_generic( _offset: /* Requires: ip0, idx */ =20 /* Compute the offset code. */ - match0 =3D base + idx; + match0 =3D base + matchIdx; rep_offset2 =3D rep_offset1; rep_offset1 =3D (U32)(ip0-match0); - offcode =3D STORE_OFFSET(rep_offset1); + offcode =3D OFFSET_TO_OFFBASE(rep_offset1); mLength =3D 4; =20 /* Count the backwards match length. */ @@ -287,11 +399,6 @@ ZSTD_compressBlock_fast_noDict_generic( ip0 +=3D mLength; anchor =3D ip0; =20 - /* write next hash table entry */ - if (ip1 < ip0) { - hashTable[hash1] =3D (U32)(ip1 - base); - } - /* Fill table and check for immediate repcode. */ if (ip0 <=3D ilimit) { /* Fill Table */ @@ -306,7 +413,7 @@ ZSTD_compressBlock_fast_noDict_generic( { U32 const tmpOff =3D rep_offset2; rep_offset2 =3D rep_of= fset1; rep_offset1 =3D tmpOff; } /* swap rep_offset2 <=3D> rep_offset1 */ hashTable[ZSTD_hashPtr(ip0, hlog, mls)] =3D (U32)(ip0-base= ); ip0 +=3D rLength; - ZSTD_storeSeq(seqStore, 0 /*litLen*/, anchor, iend, STORE_= REPCODE_1, rLength); + ZSTD_storeSeq(seqStore, 0 /*litLen*/, anchor, iend, REPCOD= E1_TO_OFFBASE, rLength); anchor =3D ip0; continue; /* faster when present (confirmed on gcc-8) ..= . (?) */ } } } @@ -314,12 +421,12 @@ ZSTD_compressBlock_fast_noDict_generic( goto _start; } =20 -#define ZSTD_GEN_FAST_FN(dictMode, mls, step) = \ - static size_t ZSTD_compressBlock_fast_##dictMode##_##mls##_##step( = \ - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_= NUM], \ +#define ZSTD_GEN_FAST_FN(dictMode, mml, cmov) = \ + static size_t ZSTD_compressBlock_fast_##dictMode##_##mml##_##cmov( = \ + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_= NUM], \ void const* src, size_t srcSize) = \ { = \ - return ZSTD_compressBlock_fast_##dictMode##_generic(ms, seqStore, = rep, src, srcSize, mls, step); \ + return ZSTD_compressBlock_fast_##dictMode##_generic(ms, seqStore, = rep, src, srcSize, mml, cmov); \ } =20 ZSTD_GEN_FAST_FN(noDict, 4, 1) @@ -333,13 +440,15 @@ ZSTD_GEN_FAST_FN(noDict, 6, 0) ZSTD_GEN_FAST_FN(noDict, 7, 0) =20 size_t ZSTD_compressBlock_fast( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize) { - U32 const mls =3D ms->cParams.minMatch; + U32 const mml =3D ms->cParams.minMatch; + /* use cmov when "candidate in range" branch is likely unpredictable */ + int const useCmov =3D ms->cParams.windowLog < 19; assert(ms->dictMatchState =3D=3D NULL); - if (ms->cParams.targetLength > 1) { - switch(mls) + if (useCmov) { + switch(mml) { default: /* includes case 3 */ case 4 : @@ -352,7 +461,8 @@ size_t ZSTD_compressBlock_fast( return ZSTD_compressBlock_fast_noDict_7_1(ms, seqStore, rep, s= rc, srcSize); } } else { - switch(mls) + /* use a branch instead */ + switch(mml) { default: /* includes case 3 */ case 4 : @@ -364,13 +474,13 @@ size_t ZSTD_compressBlock_fast( case 7 : return ZSTD_compressBlock_fast_noDict_7_0(ms, seqStore, rep, s= rc, srcSize); } - } } =20 FORCE_INLINE_TEMPLATE +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR size_t ZSTD_compressBlock_fast_dictMatchState_generic( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize, U32 const mls, U32 const hasStep) { const ZSTD_compressionParameters* const cParams =3D &ms->cParams; @@ -380,16 +490,16 @@ size_t ZSTD_compressBlock_fast_dictMatchState_generic( U32 const stepSize =3D cParams->targetLength + !(cParams->targetLength= ); const BYTE* const base =3D ms->window.base; const BYTE* const istart =3D (const BYTE*)src; - const BYTE* ip =3D istart; + const BYTE* ip0 =3D istart; + const BYTE* ip1 =3D ip0 + stepSize; /* we assert below that stepSize >= =3D 1 */ const BYTE* anchor =3D istart; const U32 prefixStartIndex =3D ms->window.dictLimit; const BYTE* const prefixStart =3D base + prefixStartIndex; const BYTE* const iend =3D istart + srcSize; const BYTE* const ilimit =3D iend - HASH_READ_SIZE; U32 offset_1=3Drep[0], offset_2=3Drep[1]; - U32 offsetSaved =3D 0; =20 - const ZSTD_matchState_t* const dms =3D ms->dictMatchState; + const ZSTD_MatchState_t* const dms =3D ms->dictMatchState; const ZSTD_compressionParameters* const dictCParams =3D &dms->cParams ; const U32* const dictHashTable =3D dms->hashTable; const U32 dictStartIndex =3D dms->window.dictLimit; @@ -397,13 +507,13 @@ size_t ZSTD_compressBlock_fast_dictMatchState_generic( const BYTE* const dictStart =3D dictBase + dictStartIndex; const BYTE* const dictEnd =3D dms->window.nextSrc; const U32 dictIndexDelta =3D prefixStartIndex - (U32)(dictEnd - = dictBase); - const U32 dictAndPrefixLength =3D (U32)(ip - prefixStart + dictEnd - = dictStart); - const U32 dictHLog =3D dictCParams->hashLog; + const U32 dictAndPrefixLength =3D (U32)(istart - prefixStart + dictEn= d - dictStart); + const U32 dictHBits =3D dictCParams->hashLog + ZSTD_SHORT_C= ACHE_TAG_BITS; =20 /* if a dictionary is still attached, it necessarily means that * it is within window size. So we just check it. */ const U32 maxDistance =3D 1U << cParams->windowLog; - const U32 endIndex =3D (U32)((size_t)(ip - base) + srcSize); + const U32 endIndex =3D (U32)((size_t)(istart - base) + srcSize); assert(endIndex - prefixStartIndex <=3D maxDistance); (void)maxDistance; (void)endIndex; /* these variables are not used w= hen assert() is disabled */ =20 @@ -413,106 +523,154 @@ size_t ZSTD_compressBlock_fast_dictMatchState_gener= ic( * when translating a dict index into a local index */ assert(prefixStartIndex >=3D (U32)(dictEnd - dictBase)); =20 + if (ms->prefetchCDictTables) { + size_t const hashTableBytes =3D (((size_t)1) << dictCParams->hashL= og) * sizeof(U32); + PREFETCH_AREA(dictHashTable, hashTableBytes); + } + /* init */ DEBUGLOG(5, "ZSTD_compressBlock_fast_dictMatchState_generic"); - ip +=3D (dictAndPrefixLength =3D=3D 0); + ip0 +=3D (dictAndPrefixLength =3D=3D 0); /* dictMatchState repCode checks don't currently handle repCode =3D=3D= 0 * disabling. */ assert(offset_1 <=3D dictAndPrefixLength); assert(offset_2 <=3D dictAndPrefixLength); =20 - /* Main Search Loop */ - while (ip < ilimit) { /* < instead of <=3D, because repcode check at= (ip+1) */ + /* Outer search loop */ + assert(stepSize >=3D 1); + while (ip1 <=3D ilimit) { /* repcode check at (ip0 + 1) is safe beca= use ip0 < ip1 */ size_t mLength; - size_t const h =3D ZSTD_hashPtr(ip, hlog, mls); - U32 const curr =3D (U32)(ip-base); - U32 const matchIndex =3D hashTable[h]; - const BYTE* match =3D base + matchIndex; - const U32 repIndex =3D curr + 1 - offset_1; - const BYTE* repMatch =3D (repIndex < prefixStartIndex) ? - dictBase + (repIndex - dictIndexDelta) : - base + repIndex; - hashTable[h] =3D curr; /* update hash table */ - - if ( ((U32)((prefixStartIndex-1) - repIndex) >=3D 3) /* intentiona= l underflow : ensure repIndex isn't overlapping dict + prefix */ - && (MEM_read32(repMatch) =3D=3D MEM_read32(ip+1)) ) { - const BYTE* const repMatchEnd =3D repIndex < prefixStartIndex = ? dictEnd : iend; - mLength =3D ZSTD_count_2segments(ip+1+4, repMatch+4, iend, rep= MatchEnd, prefixStart) + 4; - ip++; - ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend, STO= RE_REPCODE_1, mLength); - } else if ( (matchIndex <=3D prefixStartIndex) ) { - size_t const dictHash =3D ZSTD_hashPtr(ip, dictHLog, mls); - U32 const dictMatchIndex =3D dictHashTable[dictHash]; - const BYTE* dictMatch =3D dictBase + dictMatchIndex; - if (dictMatchIndex <=3D dictStartIndex || - MEM_read32(dictMatch) !=3D MEM_read32(ip)) { - assert(stepSize >=3D 1); - ip +=3D ((ip-anchor) >> kSearchStrength) + stepSize; - continue; - } else { - /* found a dict match */ - U32 const offset =3D (U32)(curr-dictMatchIndex-dictIndexDe= lta); - mLength =3D ZSTD_count_2segments(ip+4, dictMatch+4, iend, = dictEnd, prefixStart) + 4; - while (((ip>anchor) & (dictMatch>dictStart)) - && (ip[-1] =3D=3D dictMatch[-1])) { - ip--; dictMatch--; mLength++; + size_t hash0 =3D ZSTD_hashPtr(ip0, hlog, mls); + + size_t const dictHashAndTag0 =3D ZSTD_hashPtr(ip0, dictHBits, mls); + U32 dictMatchIndexAndTag =3D dictHashTable[dictHashAndTag0 >> ZSTD= _SHORT_CACHE_TAG_BITS]; + int dictTagsMatch =3D ZSTD_comparePackedTags(dictMatchIndexAndTag,= dictHashAndTag0); + + U32 matchIndex =3D hashTable[hash0]; + U32 curr =3D (U32)(ip0 - base); + size_t step =3D stepSize; + const size_t kStepIncr =3D 1 << kSearchStrength; + const BYTE* nextStep =3D ip0 + kStepIncr; + + /* Inner search loop */ + while (1) { + const BYTE* match =3D base + matchIndex; + const U32 repIndex =3D curr + 1 - offset_1; + const BYTE* repMatch =3D (repIndex < prefixStartIndex) ? + dictBase + (repIndex - dictIndexDelta) : + base + repIndex; + const size_t hash1 =3D ZSTD_hashPtr(ip1, hlog, mls); + size_t const dictHashAndTag1 =3D ZSTD_hashPtr(ip1, dictHBits, = mls); + hashTable[hash0] =3D curr; /* update hash table */ + + if ((ZSTD_index_overlap_check(prefixStartIndex, repIndex)) + && (MEM_read32(repMatch) =3D=3D MEM_read32(ip0 + 1))) { + const BYTE* const repMatchEnd =3D repIndex < prefixStartIn= dex ? dictEnd : iend; + mLength =3D ZSTD_count_2segments(ip0 + 1 + 4, repMatch + 4= , iend, repMatchEnd, prefixStart) + 4; + ip0++; + ZSTD_storeSeq(seqStore, (size_t) (ip0 - anchor), anchor, i= end, REPCODE1_TO_OFFBASE, mLength); + break; + } + + if (dictTagsMatch) { + /* Found a possible dict match */ + const U32 dictMatchIndex =3D dictMatchIndexAndTag >> ZSTD_= SHORT_CACHE_TAG_BITS; + const BYTE* dictMatch =3D dictBase + dictMatchIndex; + if (dictMatchIndex > dictStartIndex && + MEM_read32(dictMatch) =3D=3D MEM_read32(ip0)) { + /* To replicate extDict parse behavior, we only use di= ct matches when the normal matchIndex is invalid */ + if (matchIndex <=3D prefixStartIndex) { + U32 const offset =3D (U32) (curr - dictMatchIndex = - dictIndexDelta); + mLength =3D ZSTD_count_2segments(ip0 + 4, dictMatc= h + 4, iend, dictEnd, prefixStart) + 4; + while (((ip0 > anchor) & (dictMatch > dictStart)) + && (ip0[-1] =3D=3D dictMatch[-1])) { + ip0--; + dictMatch--; + mLength++; + } /* catch up */ + offset_2 =3D offset_1; + offset_1 =3D offset; + ZSTD_storeSeq(seqStore, (size_t) (ip0 - anchor), a= nchor, iend, OFFSET_TO_OFFBASE(offset), mLength); + break; + } + } + } + + if (ZSTD_match4Found_cmov(ip0, match, matchIndex, prefixStartI= ndex)) { + /* found a regular match of size >=3D 4 */ + U32 const offset =3D (U32) (ip0 - match); + mLength =3D ZSTD_count(ip0 + 4, match + 4, iend) + 4; + while (((ip0 > anchor) & (match > prefixStart)) + && (ip0[-1] =3D=3D match[-1])) { + ip0--; + match--; + mLength++; } /* catch up */ offset_2 =3D offset_1; offset_1 =3D offset; - ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend,= STORE_OFFSET(offset), mLength); + ZSTD_storeSeq(seqStore, (size_t) (ip0 - anchor), anchor, i= end, OFFSET_TO_OFFBASE(offset), mLength); + break; } - } else if (MEM_read32(match) !=3D MEM_read32(ip)) { - /* it's not a match, and we're not going to check the dictiona= ry */ - assert(stepSize >=3D 1); - ip +=3D ((ip-anchor) >> kSearchStrength) + stepSize; - continue; - } else { - /* found a regular match */ - U32 const offset =3D (U32)(ip-match); - mLength =3D ZSTD_count(ip+4, match+4, iend) + 4; - while (((ip>anchor) & (match>prefixStart)) - && (ip[-1] =3D=3D match[-1])) { ip--; match--; mLength++;= } /* catch up */ - offset_2 =3D offset_1; - offset_1 =3D offset; - ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend, STO= RE_OFFSET(offset), mLength); - } + + /* Prepare for next iteration */ + dictMatchIndexAndTag =3D dictHashTable[dictHashAndTag1 >> ZSTD= _SHORT_CACHE_TAG_BITS]; + dictTagsMatch =3D ZSTD_comparePackedTags(dictMatchIndexAndTag,= dictHashAndTag1); + matchIndex =3D hashTable[hash1]; + + if (ip1 >=3D nextStep) { + step++; + nextStep +=3D kStepIncr; + } + ip0 =3D ip1; + ip1 =3D ip1 + step; + if (ip1 > ilimit) goto _cleanup; + + curr =3D (U32)(ip0 - base); + hash0 =3D hash1; + } /* end inner search loop */ =20 /* match found */ - ip +=3D mLength; - anchor =3D ip; + assert(mLength); + ip0 +=3D mLength; + anchor =3D ip0; =20 - if (ip <=3D ilimit) { + if (ip0 <=3D ilimit) { /* Fill Table */ assert(base+curr+2 > istart); /* check base overflow */ hashTable[ZSTD_hashPtr(base+curr+2, hlog, mls)] =3D curr+2; /= * here because curr+2 could be > iend-8 */ - hashTable[ZSTD_hashPtr(ip-2, hlog, mls)] =3D (U32)(ip-2-base); + hashTable[ZSTD_hashPtr(ip0-2, hlog, mls)] =3D (U32)(ip0-2-base= ); =20 /* check immediate repcode */ - while (ip <=3D ilimit) { - U32 const current2 =3D (U32)(ip-base); + while (ip0 <=3D ilimit) { + U32 const current2 =3D (U32)(ip0-base); U32 const repIndex2 =3D current2 - offset_2; const BYTE* repMatch2 =3D repIndex2 < prefixStartIndex ? dictBase - dictIndexDelta + repIndex2 : base + repIndex2; - if ( ((U32)((prefixStartIndex-1) - (U32)repIndex2) >=3D 3 = /* intentional overflow */) - && (MEM_read32(repMatch2) =3D=3D MEM_read32(ip)) ) { + if ( (ZSTD_index_overlap_check(prefixStartIndex, repIndex2= )) + && (MEM_read32(repMatch2) =3D=3D MEM_read32(ip0))) { const BYTE* const repEnd2 =3D repIndex2 < prefixStartI= ndex ? dictEnd : iend; - size_t const repLength2 =3D ZSTD_count_2segments(ip+4,= repMatch2+4, iend, repEnd2, prefixStart) + 4; + size_t const repLength2 =3D ZSTD_count_2segments(ip0+4= , repMatch2+4, iend, repEnd2, prefixStart) + 4; U32 tmpOffset =3D offset_2; offset_2 =3D offset_1; off= set_1 =3D tmpOffset; /* swap offset_2 <=3D> offset_1 */ - ZSTD_storeSeq(seqStore, 0, anchor, iend, STORE_REPCODE= _1, repLength2); - hashTable[ZSTD_hashPtr(ip, hlog, mls)] =3D current2; - ip +=3D repLength2; - anchor =3D ip; + ZSTD_storeSeq(seqStore, 0, anchor, iend, REPCODE1_TO_O= FFBASE, repLength2); + hashTable[ZSTD_hashPtr(ip0, hlog, mls)] =3D current2; + ip0 +=3D repLength2; + anchor =3D ip0; continue; } break; } } + + /* Prepare for next iteration */ + assert(ip0 =3D=3D anchor); + ip1 =3D ip0 + stepSize; } =20 +_cleanup: /* save reps for next block */ - rep[0] =3D offset_1 ? offset_1 : offsetSaved; - rep[1] =3D offset_2 ? offset_2 : offsetSaved; + rep[0] =3D offset_1; + rep[1] =3D offset_2; =20 /* Return the last literals size */ return (size_t)(iend - anchor); @@ -525,7 +683,7 @@ ZSTD_GEN_FAST_FN(dictMatchState, 6, 0) ZSTD_GEN_FAST_FN(dictMatchState, 7, 0) =20 size_t ZSTD_compressBlock_fast_dictMatchState( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize) { U32 const mls =3D ms->cParams.minMatch; @@ -545,19 +703,20 @@ size_t ZSTD_compressBlock_fast_dictMatchState( } =20 =20 -static size_t ZSTD_compressBlock_fast_extDict_generic( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +static +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR +size_t ZSTD_compressBlock_fast_extDict_generic( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize, U32 const mls, U32 const hasStep) { const ZSTD_compressionParameters* const cParams =3D &ms->cParams; U32* const hashTable =3D ms->hashTable; U32 const hlog =3D cParams->hashLog; /* support stepSize of 0 */ - U32 const stepSize =3D cParams->targetLength + !(cParams->targetLength= ); + size_t const stepSize =3D cParams->targetLength + !(cParams->targetLen= gth) + 1; const BYTE* const base =3D ms->window.base; const BYTE* const dictBase =3D ms->window.dictBase; const BYTE* const istart =3D (const BYTE*)src; - const BYTE* ip =3D istart; const BYTE* anchor =3D istart; const U32 endIndex =3D (U32)((size_t)(istart - base) + srcSize); const U32 lowLimit =3D ZSTD_getLowestMatchIndex(ms, endIndex, cParam= s->windowLog); @@ -570,6 +729,28 @@ static size_t ZSTD_compressBlock_fast_extDict_generic( const BYTE* const iend =3D istart + srcSize; const BYTE* const ilimit =3D iend - 8; U32 offset_1=3Drep[0], offset_2=3Drep[1]; + U32 offsetSaved1 =3D 0, offsetSaved2 =3D 0; + + const BYTE* ip0 =3D istart; + const BYTE* ip1; + const BYTE* ip2; + const BYTE* ip3; + U32 current0; + + + size_t hash0; /* hash for ip0 */ + size_t hash1; /* hash for ip1 */ + U32 idx; /* match idx for ip0 */ + const BYTE* idxBase; /* base pointer for idx */ + + U32 offcode; + const BYTE* match0; + size_t mLength; + const BYTE* matchEnd =3D 0; /* initialize to avoid warning, assert != =3D 0 later */ + + size_t step; + const BYTE* nextStep; + const size_t kStepIncr =3D (1 << (kSearchStrength - 1)); =20 (void)hasStep; /* not currently specialized on whether it's accelerate= d */ =20 @@ -579,75 +760,202 @@ static size_t ZSTD_compressBlock_fast_extDict_generi= c( if (prefixStartIndex =3D=3D dictStartIndex) return ZSTD_compressBlock_fast(ms, seqStore, rep, src, srcSize); =20 - /* Search Loop */ - while (ip < ilimit) { /* < instead of <=3D, because (ip+1) */ - const size_t h =3D ZSTD_hashPtr(ip, hlog, mls); - const U32 matchIndex =3D hashTable[h]; - const BYTE* const matchBase =3D matchIndex < prefixStartIndex ? di= ctBase : base; - const BYTE* match =3D matchBase + matchIndex; - const U32 curr =3D (U32)(ip-base); - const U32 repIndex =3D curr + 1 - offset_1; - const BYTE* const repBase =3D repIndex < prefixStartIndex ? dictBa= se : base; - const BYTE* const repMatch =3D repBase + repIndex; - hashTable[h] =3D curr; /* update hash table */ - DEBUGLOG(7, "offset_1 =3D %u , curr =3D %u", offset_1, curr); - - if ( ( ((U32)((prefixStartIndex-1) - repIndex) >=3D 3) /* intentio= nal underflow */ - & (offset_1 <=3D curr+1 - dictStartIndex) ) /* note: we are s= earching at curr+1 */ - && (MEM_read32(repMatch) =3D=3D MEM_read32(ip+1)) ) { - const BYTE* const repMatchEnd =3D repIndex < prefixStartIndex = ? dictEnd : iend; - size_t const rLength =3D ZSTD_count_2segments(ip+1 +4, repMatc= h +4, iend, repMatchEnd, prefixStart) + 4; - ip++; - ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend, STO= RE_REPCODE_1, rLength); - ip +=3D rLength; - anchor =3D ip; - } else { - if ( (matchIndex < dictStartIndex) || - (MEM_read32(match) !=3D MEM_read32(ip)) ) { - assert(stepSize >=3D 1); - ip +=3D ((ip-anchor) >> kSearchStrength) + stepSize; - continue; + { U32 const curr =3D (U32)(ip0 - base); + U32 const maxRep =3D curr - dictStartIndex; + if (offset_2 >=3D maxRep) offsetSaved2 =3D offset_2, offset_2 =3D = 0; + if (offset_1 >=3D maxRep) offsetSaved1 =3D offset_1, offset_1 =3D = 0; + } + + /* start each op */ +_start: /* Requires: ip0 */ + + step =3D stepSize; + nextStep =3D ip0 + kStepIncr; + + /* calculate positions, ip0 - anchor =3D=3D 0, so we skip step calc */ + ip1 =3D ip0 + 1; + ip2 =3D ip0 + step; + ip3 =3D ip2 + 1; + + if (ip3 >=3D ilimit) { + goto _cleanup; + } + + hash0 =3D ZSTD_hashPtr(ip0, hlog, mls); + hash1 =3D ZSTD_hashPtr(ip1, hlog, mls); + + idx =3D hashTable[hash0]; + idxBase =3D idx < prefixStartIndex ? dictBase : base; + + do { + { /* load repcode match for ip[2] */ + U32 const current2 =3D (U32)(ip2 - base); + U32 const repIndex =3D current2 - offset_1; + const BYTE* const repBase =3D repIndex < prefixStartIndex ? di= ctBase : base; + U32 rval; + if ( ((U32)(prefixStartIndex - repIndex) >=3D 4) /* intentiona= l underflow */ + & (offset_1 > 0) ) { + rval =3D MEM_read32(repBase + repIndex); + } else { + rval =3D MEM_read32(ip2) ^ 1; /* guaranteed to not match. = */ } - { const BYTE* const matchEnd =3D matchIndex < prefixStartInd= ex ? dictEnd : iend; - const BYTE* const lowMatchPtr =3D matchIndex < prefixStart= Index ? dictStart : prefixStart; - U32 const offset =3D curr - matchIndex; - size_t mLength =3D ZSTD_count_2segments(ip+4, match+4, ien= d, matchEnd, prefixStart) + 4; - while (((ip>anchor) & (match>lowMatchPtr)) && (ip[-1] =3D= =3D match[-1])) { ip--; match--; mLength++; } /* catch up */ - offset_2 =3D offset_1; offset_1 =3D offset; /* update off= set history */ - ZSTD_storeSeq(seqStore, (size_t)(ip-anchor), anchor, iend,= STORE_OFFSET(offset), mLength); - ip +=3D mLength; - anchor =3D ip; + + /* write back hash table entry */ + current0 =3D (U32)(ip0 - base); + hashTable[hash0] =3D current0; + + /* check repcode at ip[2] */ + if (MEM_read32(ip2) =3D=3D rval) { + ip0 =3D ip2; + match0 =3D repBase + repIndex; + matchEnd =3D repIndex < prefixStartIndex ? dictEnd : iend; + assert((match0 !=3D prefixStart) & (match0 !=3D dictStart)= ); + mLength =3D ip0[-1] =3D=3D match0[-1]; + ip0 -=3D mLength; + match0 -=3D mLength; + offcode =3D REPCODE1_TO_OFFBASE; + mLength +=3D 4; + goto _match; } } =20 - if (ip <=3D ilimit) { - /* Fill Table */ - hashTable[ZSTD_hashPtr(base+curr+2, hlog, mls)] =3D curr+2; - hashTable[ZSTD_hashPtr(ip-2, hlog, mls)] =3D (U32)(ip-2-base); - /* check immediate repcode */ - while (ip <=3D ilimit) { - U32 const current2 =3D (U32)(ip-base); - U32 const repIndex2 =3D current2 - offset_2; - const BYTE* const repMatch2 =3D repIndex2 < prefixStartInd= ex ? dictBase + repIndex2 : base + repIndex2; - if ( (((U32)((prefixStartIndex-1) - repIndex2) >=3D 3) & (= offset_2 <=3D curr - dictStartIndex)) /* intentional overflow */ - && (MEM_read32(repMatch2) =3D=3D MEM_read32(ip)) ) { - const BYTE* const repEnd2 =3D repIndex2 < prefixStartI= ndex ? dictEnd : iend; - size_t const repLength2 =3D ZSTD_count_2segments(ip+4,= repMatch2+4, iend, repEnd2, prefixStart) + 4; - { U32 const tmpOffset =3D offset_2; offset_2 =3D offse= t_1; offset_1 =3D tmpOffset; } /* swap offset_2 <=3D> offset_1 */ - ZSTD_storeSeq(seqStore, 0 /*litlen*/, anchor, iend, ST= ORE_REPCODE_1, repLength2); - hashTable[ZSTD_hashPtr(ip, hlog, mls)] =3D current2; - ip +=3D repLength2; - anchor =3D ip; - continue; - } - break; - } } } + { /* load match for ip[0] */ + U32 const mval =3D idx >=3D dictStartIndex ? + MEM_read32(idxBase + idx) : + MEM_read32(ip0) ^ 1; /* guaranteed not to match */ + + /* check match at ip[0] */ + if (MEM_read32(ip0) =3D=3D mval) { + /* found a match! */ + goto _offset; + } } + + /* lookup ip[1] */ + idx =3D hashTable[hash1]; + idxBase =3D idx < prefixStartIndex ? dictBase : base; + + /* hash ip[2] */ + hash0 =3D hash1; + hash1 =3D ZSTD_hashPtr(ip2, hlog, mls); + + /* advance to next positions */ + ip0 =3D ip1; + ip1 =3D ip2; + ip2 =3D ip3; + + /* write back hash table entry */ + current0 =3D (U32)(ip0 - base); + hashTable[hash0] =3D current0; + + { /* load match for ip[0] */ + U32 const mval =3D idx >=3D dictStartIndex ? + MEM_read32(idxBase + idx) : + MEM_read32(ip0) ^ 1; /* guaranteed not to match */ + + /* check match at ip[0] */ + if (MEM_read32(ip0) =3D=3D mval) { + /* found a match! */ + goto _offset; + } } + + /* lookup ip[1] */ + idx =3D hashTable[hash1]; + idxBase =3D idx < prefixStartIndex ? dictBase : base; + + /* hash ip[2] */ + hash0 =3D hash1; + hash1 =3D ZSTD_hashPtr(ip2, hlog, mls); + + /* advance to next positions */ + ip0 =3D ip1; + ip1 =3D ip2; + ip2 =3D ip0 + step; + ip3 =3D ip1 + step; + + /* calculate step */ + if (ip2 >=3D nextStep) { + step++; + PREFETCH_L1(ip1 + 64); + PREFETCH_L1(ip1 + 128); + nextStep +=3D kStepIncr; + } + } while (ip3 < ilimit); + +_cleanup: + /* Note that there are probably still a couple positions we could sear= ch. + * However, it seems to be a meaningful performance hit to try to sear= ch + * them. So let's not. */ + + /* If offset_1 started invalid (offsetSaved1 !=3D 0) and became valid = (offset_1 !=3D 0), + * rotate saved offsets. See comment in ZSTD_compressBlock_fast_noDict= for more context. */ + offsetSaved2 =3D ((offsetSaved1 !=3D 0) && (offset_1 !=3D 0)) ? offset= Saved1 : offsetSaved2; =20 /* save reps for next block */ - rep[0] =3D offset_1; - rep[1] =3D offset_2; + rep[0] =3D offset_1 ? offset_1 : offsetSaved1; + rep[1] =3D offset_2 ? offset_2 : offsetSaved2; =20 /* Return the last literals size */ return (size_t)(iend - anchor); + +_offset: /* Requires: ip0, idx, idxBase */ + + /* Compute the offset code. */ + { U32 const offset =3D current0 - idx; + const BYTE* const lowMatchPtr =3D idx < prefixStartIndex ? dictSta= rt : prefixStart; + matchEnd =3D idx < prefixStartIndex ? dictEnd : iend; + match0 =3D idxBase + idx; + offset_2 =3D offset_1; + offset_1 =3D offset; + offcode =3D OFFSET_TO_OFFBASE(offset); + mLength =3D 4; + + /* Count the backwards match length. */ + while (((ip0>anchor) & (match0>lowMatchPtr)) && (ip0[-1] =3D=3D ma= tch0[-1])) { + ip0--; + match0--; + mLength++; + } } + +_match: /* Requires: ip0, match0, offcode, matchEnd */ + + /* Count the forward length. */ + assert(matchEnd !=3D 0); + mLength +=3D ZSTD_count_2segments(ip0 + mLength, match0 + mLength, ien= d, matchEnd, prefixStart); + + ZSTD_storeSeq(seqStore, (size_t)(ip0 - anchor), anchor, iend, offcode,= mLength); + + ip0 +=3D mLength; + anchor =3D ip0; + + /* write next hash table entry */ + if (ip1 < ip0) { + hashTable[hash1] =3D (U32)(ip1 - base); + } + + /* Fill table and check for immediate repcode. */ + if (ip0 <=3D ilimit) { + /* Fill Table */ + assert(base+current0+2 > istart); /* check base overflow */ + hashTable[ZSTD_hashPtr(base+current0+2, hlog, mls)] =3D current0+2= ; /* here because current+2 could be > iend-8 */ + hashTable[ZSTD_hashPtr(ip0-2, hlog, mls)] =3D (U32)(ip0-2-base); + + while (ip0 <=3D ilimit) { + U32 const repIndex2 =3D (U32)(ip0-base) - offset_2; + const BYTE* const repMatch2 =3D repIndex2 < prefixStartIndex ?= dictBase + repIndex2 : base + repIndex2; + if ( ((ZSTD_index_overlap_check(prefixStartIndex, repIndex2)) = & (offset_2 > 0)) + && (MEM_read32(repMatch2) =3D=3D MEM_read32(ip0)) ) { + const BYTE* const repEnd2 =3D repIndex2 < prefixStartIndex= ? dictEnd : iend; + size_t const repLength2 =3D ZSTD_count_2segments(ip0+4, re= pMatch2+4, iend, repEnd2, prefixStart) + 4; + { U32 const tmpOffset =3D offset_2; offset_2 =3D offset_1;= offset_1 =3D tmpOffset; } /* swap offset_2 <=3D> offset_1 */ + ZSTD_storeSeq(seqStore, 0 /*litlen*/, anchor, iend, REPCOD= E1_TO_OFFBASE, repLength2); + hashTable[ZSTD_hashPtr(ip0, hlog, mls)] =3D (U32)(ip0-base= ); + ip0 +=3D repLength2; + anchor =3D ip0; + continue; + } + break; + } } + + goto _start; } =20 ZSTD_GEN_FAST_FN(extDict, 4, 0) @@ -656,10 +964,11 @@ ZSTD_GEN_FAST_FN(extDict, 6, 0) ZSTD_GEN_FAST_FN(extDict, 7, 0) =20 size_t ZSTD_compressBlock_fast_extDict( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize) { U32 const mls =3D ms->cParams.minMatch; + assert(ms->dictMatchState =3D=3D NULL); switch(mls) { default: /* includes case 3 */ diff --git a/lib/zstd/compress/zstd_fast.h b/lib/zstd/compress/zstd_fast.h index fddc2f532d21..04fde0a72a4e 100644 --- a/lib/zstd/compress/zstd_fast.h +++ b/lib/zstd/compress/zstd_fast.h @@ -1,5 +1,6 @@ +/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */ /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -11,21 +12,20 @@ #ifndef ZSTD_FAST_H #define ZSTD_FAST_H =20 - #include "../common/mem.h" /* U32 */ #include "zstd_compress_internal.h" =20 -void ZSTD_fillHashTable(ZSTD_matchState_t* ms, - void const* end, ZSTD_dictTableLoadMethod_e dtlm); +void ZSTD_fillHashTable(ZSTD_MatchState_t* ms, + void const* end, ZSTD_dictTableLoadMethod_e dtlm, + ZSTD_tableFillPurpose_e tfp); size_t ZSTD_compressBlock_fast( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); size_t ZSTD_compressBlock_fast_dictMatchState( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); size_t ZSTD_compressBlock_fast_extDict( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); =20 - #endif /* ZSTD_FAST_H */ diff --git a/lib/zstd/compress/zstd_lazy.c b/lib/zstd/compress/zstd_lazy.c index 0298a01a7504..88e2501fe3ef 100644 --- a/lib/zstd/compress/zstd_lazy.c +++ b/lib/zstd/compress/zstd_lazy.c @@ -1,5 +1,6 @@ +// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -10,14 +11,23 @@ =20 #include "zstd_compress_internal.h" #include "zstd_lazy.h" +#include "../common/bits.h" /* ZSTD_countTrailingZeros64 */ + +#if !defined(ZSTD_EXCLUDE_GREEDY_BLOCK_COMPRESSOR) \ + || !defined(ZSTD_EXCLUDE_LAZY_BLOCK_COMPRESSOR) \ + || !defined(ZSTD_EXCLUDE_LAZY2_BLOCK_COMPRESSOR) \ + || !defined(ZSTD_EXCLUDE_BTLAZY2_BLOCK_COMPRESSOR) + +#define kLazySkippingStep 8 =20 =20 /*-************************************* * Binary Tree search ***************************************/ =20 -static void -ZSTD_updateDUBT(ZSTD_matchState_t* ms, +static +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR +void ZSTD_updateDUBT(ZSTD_MatchState_t* ms, const BYTE* ip, const BYTE* iend, U32 mls) { @@ -60,8 +70,9 @@ ZSTD_updateDUBT(ZSTD_matchState_t* ms, * sort one already inserted but unsorted position * assumption : curr >=3D btlow =3D=3D (curr - btmask) * doesn't fail */ -static void -ZSTD_insertDUBT1(const ZSTD_matchState_t* ms, +static +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR +void ZSTD_insertDUBT1(const ZSTD_MatchState_t* ms, U32 curr, const BYTE* inputEnd, U32 nbCompares, U32 btLow, const ZSTD_dictMode_e dictMode) @@ -149,9 +160,10 @@ ZSTD_insertDUBT1(const ZSTD_matchState_t* ms, } =20 =20 -static size_t -ZSTD_DUBT_findBetterDictMatch ( - const ZSTD_matchState_t* ms, +static +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR +size_t ZSTD_DUBT_findBetterDictMatch ( + const ZSTD_MatchState_t* ms, const BYTE* const ip, const BYTE* const iend, size_t* offsetPtr, size_t bestLength, @@ -159,7 +171,7 @@ ZSTD_DUBT_findBetterDictMatch ( U32 const mls, const ZSTD_dictMode_e dictMode) { - const ZSTD_matchState_t * const dms =3D ms->dictMatchState; + const ZSTD_MatchState_t * const dms =3D ms->dictMatchState; const ZSTD_compressionParameters* const dmsCParams =3D &dms->cParams; const U32 * const dictHashTable =3D dms->hashTable; U32 const hashLog =3D dmsCParams->hashLog; @@ -197,8 +209,8 @@ ZSTD_DUBT_findBetterDictMatch ( U32 matchIndex =3D dictMatchIndex + dictIndexDelta; if ( (4*(int)(matchLength-bestLength)) > (int)(ZSTD_highbit32(= curr-matchIndex+1) - ZSTD_highbit32((U32)offsetPtr[0]+1)) ) { DEBUGLOG(9, "ZSTD_DUBT_findBetterDictMatch(%u) : found bet= ter match length %u -> %u and offsetCode %u -> %u (dictMatchIndex %u, match= Index %u)", - curr, (U32)bestLength, (U32)matchLength, (U32)*offsetP= tr, STORE_OFFSET(curr - matchIndex), dictMatchIndex, matchIndex); - bestLength =3D matchLength, *offsetPtr =3D STORE_OFFSET(cu= rr - matchIndex); + curr, (U32)bestLength, (U32)matchLength, (U32)*offsetP= tr, OFFSET_TO_OFFBASE(curr - matchIndex), dictMatchIndex, matchIndex); + bestLength =3D matchLength, *offsetPtr =3D OFFSET_TO_OFFBA= SE(curr - matchIndex); } if (ip+matchLength =3D=3D iend) { /* reached end of input : = ip[matchLength] is not valid, no way to know if it's larger or smaller than= match */ break; /* drop, to guarantee consistency (miss a little = bit of compression) */ @@ -218,7 +230,7 @@ ZSTD_DUBT_findBetterDictMatch ( } =20 if (bestLength >=3D MINMATCH) { - U32 const mIndex =3D curr - (U32)STORED_OFFSET(*offsetPtr); (void)= mIndex; + U32 const mIndex =3D curr - (U32)OFFBASE_TO_OFFSET(*offsetPtr); (v= oid)mIndex; DEBUGLOG(8, "ZSTD_DUBT_findBetterDictMatch(%u) : found match of le= ngth %u and offsetCode %u (pos %u)", curr, (U32)bestLength, (U32)*offsetPtr, mIndex); } @@ -227,10 +239,11 @@ ZSTD_DUBT_findBetterDictMatch ( } =20 =20 -static size_t -ZSTD_DUBT_findBestMatch(ZSTD_matchState_t* ms, +static +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR +size_t ZSTD_DUBT_findBestMatch(ZSTD_MatchState_t* ms, const BYTE* const ip, const BYTE* const iend, - size_t* offsetPtr, + size_t* offBasePtr, U32 const mls, const ZSTD_dictMode_e dictMode) { @@ -327,8 +340,8 @@ ZSTD_DUBT_findBestMatch(ZSTD_matchState_t* ms, if (matchLength > bestLength) { if (matchLength > matchEndIdx - matchIndex) matchEndIdx =3D matchIndex + (U32)matchLength; - if ( (4*(int)(matchLength-bestLength)) > (int)(ZSTD_highbi= t32(curr-matchIndex+1) - ZSTD_highbit32((U32)offsetPtr[0]+1)) ) - bestLength =3D matchLength, *offsetPtr =3D STORE_OFFSE= T(curr - matchIndex); + if ( (4*(int)(matchLength-bestLength)) > (int)(ZSTD_highbi= t32(curr - matchIndex + 1) - ZSTD_highbit32((U32)*offBasePtr)) ) + bestLength =3D matchLength, *offBasePtr =3D OFFSET_TO_= OFFBASE(curr - matchIndex); if (ip+matchLength =3D=3D iend) { /* equal : no way to k= now if inf or sup */ if (dictMode =3D=3D ZSTD_dictMatchState) { nbCompares =3D 0; /* in addition to avoiding check= ing any @@ -361,16 +374,16 @@ ZSTD_DUBT_findBestMatch(ZSTD_matchState_t* ms, if (dictMode =3D=3D ZSTD_dictMatchState && nbCompares) { bestLength =3D ZSTD_DUBT_findBetterDictMatch( ms, ip, iend, - offsetPtr, bestLength, nbCompares, + offBasePtr, bestLength, nbCompares, mls, dictMode); } =20 assert(matchEndIdx > curr+8); /* ensure nextToUpdate is increased = */ ms->nextToUpdate =3D matchEndIdx - 8; /* skip repetitive pattern= s */ if (bestLength >=3D MINMATCH) { - U32 const mIndex =3D curr - (U32)STORED_OFFSET(*offsetPtr); (v= oid)mIndex; + U32 const mIndex =3D curr - (U32)OFFBASE_TO_OFFSET(*offBasePtr= ); (void)mIndex; DEBUGLOG(8, "ZSTD_DUBT_findBestMatch(%u) : found match of leng= th %u and offsetCode %u (pos %u)", - curr, (U32)bestLength, (U32)*offsetPtr, mIndex); + curr, (U32)bestLength, (U32)*offBasePtr, mIndex); } return bestLength; } @@ -378,24 +391,25 @@ ZSTD_DUBT_findBestMatch(ZSTD_matchState_t* ms, =20 =20 /* ZSTD_BtFindBestMatch() : Tree updater, providing best match */ -FORCE_INLINE_TEMPLATE size_t -ZSTD_BtFindBestMatch( ZSTD_matchState_t* ms, +FORCE_INLINE_TEMPLATE +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR +size_t ZSTD_BtFindBestMatch( ZSTD_MatchState_t* ms, const BYTE* const ip, const BYTE* const iLimit, - size_t* offsetPtr, + size_t* offBasePtr, const U32 mls /* template */, const ZSTD_dictMode_e dictMode) { DEBUGLOG(7, "ZSTD_BtFindBestMatch"); if (ip < ms->window.base + ms->nextToUpdate) return 0; /* skipped ar= ea */ ZSTD_updateDUBT(ms, ip, iLimit, mls); - return ZSTD_DUBT_findBestMatch(ms, ip, iLimit, offsetPtr, mls, dictMod= e); + return ZSTD_DUBT_findBestMatch(ms, ip, iLimit, offBasePtr, mls, dictMo= de); } =20 /* ********************************* * Dedicated dict search ***********************************/ =20 -void ZSTD_dedicatedDictSearch_lazy_loadDictionary(ZSTD_matchState_t* ms, c= onst BYTE* const ip) +void ZSTD_dedicatedDictSearch_lazy_loadDictionary(ZSTD_MatchState_t* ms, c= onst BYTE* const ip) { const BYTE* const base =3D ms->window.base; U32 const target =3D (U32)(ip - base); @@ -514,7 +528,7 @@ void ZSTD_dedicatedDictSearch_lazy_loadDictionary(ZSTD_= matchState_t* ms, const B */ FORCE_INLINE_TEMPLATE size_t ZSTD_dedicatedDictSearch_lazy_search(size_t* offsetPtr, size_t ml, = U32 nbAttempts, - const ZSTD_matchState_t* const= dms, + const ZSTD_MatchState_t* const= dms, const BYTE* const ip, const BY= TE* const iLimit, const BYTE* const prefixStart,= const U32 curr, const U32 dictLimit, const siz= e_t ddsIdx) { @@ -561,7 +575,7 @@ size_t ZSTD_dedicatedDictSearch_lazy_search(size_t* off= setPtr, size_t ml, U32 nb /* save best solution */ if (currentMl > ml) { ml =3D currentMl; - *offsetPtr =3D STORE_OFFSET(curr - (matchIndex + ddsIndexDelta= )); + *offsetPtr =3D OFFSET_TO_OFFBASE(curr - (matchIndex + ddsIndex= Delta)); if (ip+currentMl =3D=3D iLimit) { /* best possible, avoids read overflow on next attempt */ return ml; @@ -598,7 +612,7 @@ size_t ZSTD_dedicatedDictSearch_lazy_search(size_t* off= setPtr, size_t ml, U32 nb /* save best solution */ if (currentMl > ml) { ml =3D currentMl; - *offsetPtr =3D STORE_OFFSET(curr - (matchIndex + ddsIndexD= elta)); + *offsetPtr =3D OFFSET_TO_OFFBASE(curr - (matchIndex + ddsI= ndexDelta)); if (ip+currentMl =3D=3D iLimit) break; /* best possible, a= voids read overflow on next attempt */ } } @@ -614,10 +628,12 @@ size_t ZSTD_dedicatedDictSearch_lazy_search(size_t* o= ffsetPtr, size_t ml, U32 nb =20 /* Update chains up to ip (excluded) Assumption : always within prefix (i.e. not within extDict) */ -FORCE_INLINE_TEMPLATE U32 ZSTD_insertAndFindFirstIndex_internal( - ZSTD_matchState_t* ms, +FORCE_INLINE_TEMPLATE +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR +U32 ZSTD_insertAndFindFirstIndex_internal( + ZSTD_MatchState_t* ms, const ZSTD_compressionParameters* const cParams, - const BYTE* ip, U32 const mls) + const BYTE* ip, U32 const mls, U32 const lazySkipp= ing) { U32* const hashTable =3D ms->hashTable; const U32 hashLog =3D cParams->hashLog; @@ -632,21 +648,25 @@ FORCE_INLINE_TEMPLATE U32 ZSTD_insertAndFindFirstInde= x_internal( NEXT_IN_CHAIN(idx, chainMask) =3D hashTable[h]; hashTable[h] =3D idx; idx++; + /* Stop inserting every position when in the lazy skipping mode. */ + if (lazySkipping) + break; } =20 ms->nextToUpdate =3D target; return hashTable[ZSTD_hashPtr(ip, hashLog, mls)]; } =20 -U32 ZSTD_insertAndFindFirstIndex(ZSTD_matchState_t* ms, const BYTE* ip) { +U32 ZSTD_insertAndFindFirstIndex(ZSTD_MatchState_t* ms, const BYTE* ip) { const ZSTD_compressionParameters* const cParams =3D &ms->cParams; - return ZSTD_insertAndFindFirstIndex_internal(ms, cParams, ip, ms->cPar= ams.minMatch); + return ZSTD_insertAndFindFirstIndex_internal(ms, cParams, ip, ms->cPar= ams.minMatch, /* lazySkipping*/ 0); } =20 /* inlining is important to hardwire a hot branch (template emulation) */ FORCE_INLINE_TEMPLATE +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR size_t ZSTD_HcFindBestMatch( - ZSTD_matchState_t* ms, + ZSTD_MatchState_t* ms, const BYTE* const ip, const BYTE* const iLimit, size_t* offsetPtr, const U32 mls, const ZSTD_dictMode_e dictMode) @@ -670,7 +690,7 @@ size_t ZSTD_HcFindBestMatch( U32 nbAttempts =3D 1U << cParams->searchLog; size_t ml=3D4-1; =20 - const ZSTD_matchState_t* const dms =3D ms->dictMatchState; + const ZSTD_MatchState_t* const dms =3D ms->dictMatchState; const U32 ddsHashLog =3D dictMode =3D=3D ZSTD_dedicatedDictSearch ? dms->cParams.hashLog - ZSTD_LAZY_DDSS_BUCKET_LO= G : 0; const size_t ddsIdx =3D dictMode =3D=3D ZSTD_dedicatedDictSearch @@ -684,14 +704,15 @@ size_t ZSTD_HcFindBestMatch( } =20 /* HC4 match finder */ - matchIndex =3D ZSTD_insertAndFindFirstIndex_internal(ms, cParams, ip, = mls); + matchIndex =3D ZSTD_insertAndFindFirstIndex_internal(ms, cParams, ip, = mls, ms->lazySkipping); =20 for ( ; (matchIndex>=3DlowLimit) & (nbAttempts>0) ; nbAttempts--) { size_t currentMl=3D0; if ((dictMode !=3D ZSTD_extDict) || matchIndex >=3D dictLimit) { const BYTE* const match =3D base + matchIndex; assert(matchIndex >=3D dictLimit); /* ensures this is true i= f dictMode !=3D ZSTD_extDict */ - if (match[ml] =3D=3D ip[ml]) /* potentially better */ + /* read 4B starting from (match + ml + 1 - sizeof(U32)) */ + if (MEM_read32(match + ml - 3) =3D=3D MEM_read32(ip + ml - 3))= /* potentially better */ currentMl =3D ZSTD_count(ip, match, iLimit); } else { const BYTE* const match =3D dictBase + matchIndex; @@ -703,7 +724,7 @@ size_t ZSTD_HcFindBestMatch( /* save best solution */ if (currentMl > ml) { ml =3D currentMl; - *offsetPtr =3D STORE_OFFSET(curr - matchIndex); + *offsetPtr =3D OFFSET_TO_OFFBASE(curr - matchIndex); if (ip+currentMl =3D=3D iLimit) break; /* best possible, avoid= s read overflow on next attempt */ } =20 @@ -739,7 +760,7 @@ size_t ZSTD_HcFindBestMatch( if (currentMl > ml) { ml =3D currentMl; assert(curr > matchIndex + dmsIndexDelta); - *offsetPtr =3D STORE_OFFSET(curr - (matchIndex + dmsIndexD= elta)); + *offsetPtr =3D OFFSET_TO_OFFBASE(curr - (matchIndex + dmsI= ndexDelta)); if (ip+currentMl =3D=3D iLimit) break; /* best possible, a= voids read overflow on next attempt */ } =20 @@ -756,8 +777,6 @@ size_t ZSTD_HcFindBestMatch( * (SIMD) Row-based matchfinder ***********************************/ /* Constants for row-based hash */ -#define ZSTD_ROW_HASH_TAG_OFFSET 16 /* byte offset of hashes in the ma= tch state's tagTable from the beginning of a row */ -#define ZSTD_ROW_HASH_TAG_BITS 8 /* nb bits to use for the tag */ #define ZSTD_ROW_HASH_TAG_MASK ((1u << ZSTD_ROW_HASH_TAG_BITS) - 1) #define ZSTD_ROW_HASH_MAX_ENTRIES 64 /* absolute maximum number of entr= ies per row, for all configurations */ =20 @@ -769,64 +788,19 @@ typedef U64 ZSTD_VecMask; /* Clarifies when we are = interacting with a U64 repr * Starting from the LSB, returns the idx of the next non-zero bit. * Basically counting the nb of trailing zeroes. */ -static U32 ZSTD_VecMask_next(ZSTD_VecMask val) { - assert(val !=3D 0); -# if (defined(__GNUC__) && ((__GNUC__ > 3) || ((__GNUC__ =3D=3D 3) && (_= _GNUC_MINOR__ >=3D 4)))) - if (sizeof(size_t) =3D=3D 4) { - U32 mostSignificantWord =3D (U32)(val >> 32); - U32 leastSignificantWord =3D (U32)val; - if (leastSignificantWord =3D=3D 0) { - return 32 + (U32)__builtin_ctz(mostSignificantWord); - } else { - return (U32)__builtin_ctz(leastSignificantWord); - } - } else { - return (U32)__builtin_ctzll(val); - } -# else - /* Software ctz version: http://aggregate.org/MAGIC/#Trailing%20Zero%2= 0Count - * and: https://stackoverflow.com/questions/2709430/count-number-of-bi= ts-in-a-64-bit-long-big-integer - */ - val =3D ~val & (val - 1ULL); /* Lowest set bit mask */ - val =3D val - ((val >> 1) & 0x5555555555555555); - val =3D (val & 0x3333333333333333ULL) + ((val >> 2) & 0x33333333333333= 33ULL); - return (U32)((((val + (val >> 4)) & 0xF0F0F0F0F0F0F0FULL) * 0x10101010= 1010101ULL) >> 56); -# endif -} - -/* ZSTD_rotateRight_*(): - * Rotates a bitfield to the right by "count" bits. - * https://en.wikipedia.org/w/index.php?title=3DCircular_shift&oldid=3D991= 635599#Implementing_circular_shifts - */ -FORCE_INLINE_TEMPLATE -U64 ZSTD_rotateRight_U64(U64 const value, U32 count) { - assert(count < 64); - count &=3D 0x3F; /* for fickle pattern recognition */ - return (value >> count) | (U64)(value << ((0U - count) & 0x3F)); -} - -FORCE_INLINE_TEMPLATE -U32 ZSTD_rotateRight_U32(U32 const value, U32 count) { - assert(count < 32); - count &=3D 0x1F; /* for fickle pattern recognition */ - return (value >> count) | (U32)(value << ((0U - count) & 0x1F)); -} - -FORCE_INLINE_TEMPLATE -U16 ZSTD_rotateRight_U16(U16 const value, U32 count) { - assert(count < 16); - count &=3D 0x0F; /* for fickle pattern recognition */ - return (value >> count) | (U16)(value << ((0U - count) & 0x0F)); +MEM_STATIC U32 ZSTD_VecMask_next(ZSTD_VecMask val) { + return ZSTD_countTrailingZeros64(val); } =20 /* ZSTD_row_nextIndex(): * Returns the next index to insert at within a tagTable row, and updates = the "head" - * value to reflect the update. Essentially cycles backwards from [0, {ent= ries per row}) + * value to reflect the update. Essentially cycles backwards from [1, {ent= ries per row}) */ FORCE_INLINE_TEMPLATE U32 ZSTD_row_nextIndex(BYTE* const tagRow, U32 const= rowMask) { - U32 const next =3D (*tagRow - 1) & rowMask; - *tagRow =3D (BYTE)next; - return next; + U32 next =3D (*tagRow-1) & rowMask; + next +=3D (next =3D=3D 0) ? rowMask : 0; /* skip first position */ + *tagRow =3D (BYTE)next; + return next; } =20 /* ZSTD_isAligned(): @@ -840,7 +814,7 @@ MEM_STATIC int ZSTD_isAligned(void const* ptr, size_t a= lign) { /* ZSTD_row_prefetch(): * Performs prefetching for the hashTable and tagTable at a given row. */ -FORCE_INLINE_TEMPLATE void ZSTD_row_prefetch(U32 const* hashTable, U16 con= st* tagTable, U32 const relRow, U32 const rowLog) { +FORCE_INLINE_TEMPLATE void ZSTD_row_prefetch(U32 const* hashTable, BYTE co= nst* tagTable, U32 const relRow, U32 const rowLog) { PREFETCH_L1(hashTable + relRow); if (rowLog >=3D 5) { PREFETCH_L1(hashTable + relRow + 16); @@ -859,18 +833,20 @@ FORCE_INLINE_TEMPLATE void ZSTD_row_prefetch(U32 cons= t* hashTable, U16 const* ta * Fill up the hash cache starting at idx, prefetching up to ZSTD_ROW_HASH= _CACHE_SIZE entries, * but not beyond iLimit. */ -FORCE_INLINE_TEMPLATE void ZSTD_row_fillHashCache(ZSTD_matchState_t* ms, c= onst BYTE* base, +FORCE_INLINE_TEMPLATE +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR +void ZSTD_row_fillHashCache(ZSTD_MatchState_t* ms, const BYTE* base, U32 const rowLog, U32 const mls, U32 idx, const BYTE* const iLimit) { U32 const* const hashTable =3D ms->hashTable; - U16 const* const tagTable =3D ms->tagTable; + BYTE const* const tagTable =3D ms->tagTable; U32 const hashLog =3D ms->rowHashLog; U32 const maxElemsToPrefetch =3D (base + idx) > iLimit ? 0 : (U32)(iLi= mit - (base + idx) + 1); U32 const lim =3D idx + MIN(ZSTD_ROW_HASH_CACHE_SIZE, maxElemsToPrefet= ch); =20 for (; idx < lim; ++idx) { - U32 const hash =3D (U32)ZSTD_hashPtr(base + idx, hashLog + ZSTD_RO= W_HASH_TAG_BITS, mls); + U32 const hash =3D (U32)ZSTD_hashPtrSalted(base + idx, hashLog + Z= STD_ROW_HASH_TAG_BITS, mls, ms->hashSalt); U32 const row =3D (hash >> ZSTD_ROW_HASH_TAG_BITS) << rowLog; ZSTD_row_prefetch(hashTable, tagTable, row, rowLog); ms->hashCache[idx & ZSTD_ROW_HASH_CACHE_MASK] =3D hash; @@ -885,12 +861,15 @@ FORCE_INLINE_TEMPLATE void ZSTD_row_fillHashCache(ZST= D_matchState_t* ms, const B * Returns the hash of base + idx, and replaces the hash in the hash cache= with the byte at * base + idx + ZSTD_ROW_HASH_CACHE_SIZE. Also prefetches the appropriate = rows from hashTable and tagTable. */ -FORCE_INLINE_TEMPLATE U32 ZSTD_row_nextCachedHash(U32* cache, U32 const* h= ashTable, - U16 const* tagTable, BYT= E const* base, +FORCE_INLINE_TEMPLATE +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR +U32 ZSTD_row_nextCachedHash(U32* cache, U32 const* hashTable, + BYTE const* tagTable, BY= TE const* base, U32 idx, U32 const hashL= og, - U32 const rowLog, U32 co= nst mls) + U32 const rowLog, U32 co= nst mls, + U64 const hashSalt) { - U32 const newHash =3D (U32)ZSTD_hashPtr(base+idx+ZSTD_ROW_HASH_CACHE_S= IZE, hashLog + ZSTD_ROW_HASH_TAG_BITS, mls); + U32 const newHash =3D (U32)ZSTD_hashPtrSalted(base+idx+ZSTD_ROW_HASH_C= ACHE_SIZE, hashLog + ZSTD_ROW_HASH_TAG_BITS, mls, hashSalt); U32 const row =3D (newHash >> ZSTD_ROW_HASH_TAG_BITS) << rowLog; ZSTD_row_prefetch(hashTable, tagTable, row, rowLog); { U32 const hash =3D cache[idx & ZSTD_ROW_HASH_CACHE_MASK]; @@ -902,28 +881,29 @@ FORCE_INLINE_TEMPLATE U32 ZSTD_row_nextCachedHash(U32= * cache, U32 const* hashTab /* ZSTD_row_update_internalImpl(): * Updates the hash table with positions starting from updateStartIdx unti= l updateEndIdx. */ -FORCE_INLINE_TEMPLATE void ZSTD_row_update_internalImpl(ZSTD_matchState_t*= ms, - U32 updateStartIdx= , U32 const updateEndIdx, - U32 const mls, U32= const rowLog, - U32 const rowMask,= U32 const useCache) +FORCE_INLINE_TEMPLATE +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR +void ZSTD_row_update_internalImpl(ZSTD_MatchState_t* ms, + U32 updateStartIdx, U32 const updateEndI= dx, + U32 const mls, U32 const rowLog, + U32 const rowMask, U32 const useCache) { U32* const hashTable =3D ms->hashTable; - U16* const tagTable =3D ms->tagTable; + BYTE* const tagTable =3D ms->tagTable; U32 const hashLog =3D ms->rowHashLog; const BYTE* const base =3D ms->window.base; =20 DEBUGLOG(6, "ZSTD_row_update_internalImpl(): updateStartIdx=3D%u, upda= teEndIdx=3D%u", updateStartIdx, updateEndIdx); for (; updateStartIdx < updateEndIdx; ++updateStartIdx) { - U32 const hash =3D useCache ? ZSTD_row_nextCachedHash(ms->hashCach= e, hashTable, tagTable, base, updateStartIdx, hashLog, rowLog, mls) - : (U32)ZSTD_hashPtr(base + updateStartId= x, hashLog + ZSTD_ROW_HASH_TAG_BITS, mls); + U32 const hash =3D useCache ? ZSTD_row_nextCachedHash(ms->hashCach= e, hashTable, tagTable, base, updateStartIdx, hashLog, rowLog, mls, ms->has= hSalt) + : (U32)ZSTD_hashPtrSalted(base + updateS= tartIdx, hashLog + ZSTD_ROW_HASH_TAG_BITS, mls, ms->hashSalt); U32 const relRow =3D (hash >> ZSTD_ROW_HASH_TAG_BITS) << rowLog; U32* const row =3D hashTable + relRow; - BYTE* tagRow =3D (BYTE*)(tagTable + relRow); /* Though tagTable i= s laid out as a table of U16, each tag is only 1 byte. - Explicit cast allow= s us to get exact desired position within each row */ + BYTE* tagRow =3D tagTable + relRow; U32 const pos =3D ZSTD_row_nextIndex(tagRow, rowMask); =20 - assert(hash =3D=3D ZSTD_hashPtr(base + updateStartIdx, hashLog + Z= STD_ROW_HASH_TAG_BITS, mls)); - ((BYTE*)tagRow)[pos + ZSTD_ROW_HASH_TAG_OFFSET] =3D hash & ZSTD_RO= W_HASH_TAG_MASK; + assert(hash =3D=3D ZSTD_hashPtrSalted(base + updateStartIdx, hashL= og + ZSTD_ROW_HASH_TAG_BITS, mls, ms->hashSalt)); + tagRow[pos] =3D hash & ZSTD_ROW_HASH_TAG_MASK; row[pos] =3D updateStartIdx; } } @@ -932,9 +912,11 @@ FORCE_INLINE_TEMPLATE void ZSTD_row_update_internalImp= l(ZSTD_matchState_t* ms, * Inserts the byte at ip into the appropriate position in the hash table,= and updates ms->nextToUpdate. * Skips sections of long matches as is necessary. */ -FORCE_INLINE_TEMPLATE void ZSTD_row_update_internal(ZSTD_matchState_t* ms,= const BYTE* ip, - U32 const mls, U32 con= st rowLog, - U32 const rowMask, U32= const useCache) +FORCE_INLINE_TEMPLATE +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR +void ZSTD_row_update_internal(ZSTD_MatchState_t* ms, const BYTE* ip, + U32 const mls, U32 const rowLog, + U32 const rowMask, U32 const useCache) { U32 idx =3D ms->nextToUpdate; const BYTE* const base =3D ms->window.base; @@ -965,13 +947,41 @@ FORCE_INLINE_TEMPLATE void ZSTD_row_update_internal(Z= STD_matchState_t* ms, const * External wrapper for ZSTD_row_update_internal(). Used for filling the h= ashtable during dictionary * processing. */ -void ZSTD_row_update(ZSTD_matchState_t* const ms, const BYTE* ip) { +void ZSTD_row_update(ZSTD_MatchState_t* const ms, const BYTE* ip) { const U32 rowLog =3D BOUNDED(4, ms->cParams.searchLog, 6); const U32 rowMask =3D (1u << rowLog) - 1; const U32 mls =3D MIN(ms->cParams.minMatch, 6 /* mls caps out at 6 */); =20 DEBUGLOG(5, "ZSTD_row_update(), rowLog=3D%u", rowLog); - ZSTD_row_update_internal(ms, ip, mls, rowLog, rowMask, 0 /* dont use c= ache */); + ZSTD_row_update_internal(ms, ip, mls, rowLog, rowMask, 0 /* don't use = cache */); +} + +/* Returns the mask width of bits group of which will be set to 1. Given n= ot all + * architectures have easy movemask instruction, this helps to iterate over + * groups of bits easier and faster. + */ +FORCE_INLINE_TEMPLATE U32 +ZSTD_row_matchMaskGroupWidth(const U32 rowEntries) +{ + assert((rowEntries =3D=3D 16) || (rowEntries =3D=3D 32) || rowEntries = =3D=3D 64); + assert(rowEntries <=3D ZSTD_ROW_HASH_MAX_ENTRIES); + (void)rowEntries; +#if defined(ZSTD_ARCH_ARM_NEON) + /* NEON path only works for little endian */ + if (!MEM_isLittleEndian()) { + return 1; + } + if (rowEntries =3D=3D 16) { + return 4; + } + if (rowEntries =3D=3D 32) { + return 2; + } + if (rowEntries =3D=3D 64) { + return 1; + } +#endif + return 1; } =20 #if defined(ZSTD_ARCH_X86_SSE2) @@ -994,71 +1004,82 @@ ZSTD_row_getSSEMask(int nbChunks, const BYTE* const = src, const BYTE tag, const U } #endif =20 -/* Returns a ZSTD_VecMask (U32) that has the nth bit set to 1 if the newly= -computed "tag" matches - * the hash at the nth position in a row of the tagTable. - * Each row is a circular buffer beginning at the value of "head". So we m= ust rotate the "matches" bitfield - * to match up with the actual layout of the entries within the hashTable = */ +#if defined(ZSTD_ARCH_ARM_NEON) +FORCE_INLINE_TEMPLATE ZSTD_VecMask +ZSTD_row_getNEONMask(const U32 rowEntries, const BYTE* const src, const BY= TE tag, const U32 headGrouped) +{ + assert((rowEntries =3D=3D 16) || (rowEntries =3D=3D 32) || rowEntries = =3D=3D 64); + if (rowEntries =3D=3D 16) { + /* vshrn_n_u16 shifts by 4 every u16 and narrows to 8 lower bits. + * After that groups of 4 bits represent the equalMask. We lower + * all bits except the highest in these groups by doing AND with + * 0x88 =3D 0b10001000. + */ + const uint8x16_t chunk =3D vld1q_u8(src); + const uint16x8_t equalMask =3D vreinterpretq_u16_u8(vceqq_u8(chunk= , vdupq_n_u8(tag))); + const uint8x8_t res =3D vshrn_n_u16(equalMask, 4); + const U64 matches =3D vget_lane_u64(vreinterpret_u64_u8(res), 0); + return ZSTD_rotateRight_U64(matches, headGrouped) & 0x888888888888= 8888ull; + } else if (rowEntries =3D=3D 32) { + /* Same idea as with rowEntries =3D=3D 16 but doing AND with + * 0x55 =3D 0b01010101. + */ + const uint16x8x2_t chunk =3D vld2q_u16((const uint16_t*)(const voi= d*)src); + const uint8x16_t chunk0 =3D vreinterpretq_u8_u16(chunk.val[0]); + const uint8x16_t chunk1 =3D vreinterpretq_u8_u16(chunk.val[1]); + const uint8x16_t dup =3D vdupq_n_u8(tag); + const uint8x8_t t0 =3D vshrn_n_u16(vreinterpretq_u16_u8(vceqq_u8(c= hunk0, dup)), 6); + const uint8x8_t t1 =3D vshrn_n_u16(vreinterpretq_u16_u8(vceqq_u8(c= hunk1, dup)), 6); + const uint8x8_t res =3D vsli_n_u8(t0, t1, 4); + const U64 matches =3D vget_lane_u64(vreinterpret_u64_u8(res), 0) ; + return ZSTD_rotateRight_U64(matches, headGrouped) & 0x555555555555= 5555ull; + } else { /* rowEntries =3D=3D 64 */ + const uint8x16x4_t chunk =3D vld4q_u8(src); + const uint8x16_t dup =3D vdupq_n_u8(tag); + const uint8x16_t cmp0 =3D vceqq_u8(chunk.val[0], dup); + const uint8x16_t cmp1 =3D vceqq_u8(chunk.val[1], dup); + const uint8x16_t cmp2 =3D vceqq_u8(chunk.val[2], dup); + const uint8x16_t cmp3 =3D vceqq_u8(chunk.val[3], dup); + + const uint8x16_t t0 =3D vsriq_n_u8(cmp1, cmp0, 1); + const uint8x16_t t1 =3D vsriq_n_u8(cmp3, cmp2, 1); + const uint8x16_t t2 =3D vsriq_n_u8(t1, t0, 2); + const uint8x16_t t3 =3D vsriq_n_u8(t2, t2, 4); + const uint8x8_t t4 =3D vshrn_n_u16(vreinterpretq_u16_u8(t3), 4); + const U64 matches =3D vget_lane_u64(vreinterpret_u64_u8(t4), 0); + return ZSTD_rotateRight_U64(matches, headGrouped); + } +} +#endif + +/* Returns a ZSTD_VecMask (U64) that has the nth group (determined by + * ZSTD_row_matchMaskGroupWidth) of bits set to 1 if the newly-computed "t= ag" + * matches the hash at the nth position in a row of the tagTable. + * Each row is a circular buffer beginning at the value of "headGrouped". = So we + * must rotate the "matches" bitfield to match up with the actual layout o= f the + * entries within the hashTable */ FORCE_INLINE_TEMPLATE ZSTD_VecMask -ZSTD_row_getMatchMask(const BYTE* const tagRow, const BYTE tag, const U32 = head, const U32 rowEntries) +ZSTD_row_getMatchMask(const BYTE* const tagRow, const BYTE tag, const U32 = headGrouped, const U32 rowEntries) { - const BYTE* const src =3D tagRow + ZSTD_ROW_HASH_TAG_OFFSET; + const BYTE* const src =3D tagRow; assert((rowEntries =3D=3D 16) || (rowEntries =3D=3D 32) || rowEntries = =3D=3D 64); assert(rowEntries <=3D ZSTD_ROW_HASH_MAX_ENTRIES); + assert(ZSTD_row_matchMaskGroupWidth(rowEntries) * rowEntries <=3D size= of(ZSTD_VecMask) * 8); =20 #if defined(ZSTD_ARCH_X86_SSE2) =20 - return ZSTD_row_getSSEMask(rowEntries / 16, src, tag, head); + return ZSTD_row_getSSEMask(rowEntries / 16, src, tag, headGrouped); =20 #else /* SW or NEON-LE */ =20 # if defined(ZSTD_ARCH_ARM_NEON) /* This NEON path only works for little endian - otherwise use SWAR belo= w */ if (MEM_isLittleEndian()) { - if (rowEntries =3D=3D 16) { - const uint8x16_t chunk =3D vld1q_u8(src); - const uint16x8_t equalMask =3D vreinterpretq_u16_u8(vceqq_u8(c= hunk, vdupq_n_u8(tag))); - const uint16x8_t t0 =3D vshlq_n_u16(equalMask, 7); - const uint32x4_t t1 =3D vreinterpretq_u32_u16(vsriq_n_u16(t0, = t0, 14)); - const uint64x2_t t2 =3D vreinterpretq_u64_u32(vshrq_n_u32(t1, = 14)); - const uint8x16_t t3 =3D vreinterpretq_u8_u64(vsraq_n_u64(t2, t= 2, 28)); - const U16 hi =3D (U16)vgetq_lane_u8(t3, 8); - const U16 lo =3D (U16)vgetq_lane_u8(t3, 0); - return ZSTD_rotateRight_U16((hi << 8) | lo, head); - } else if (rowEntries =3D=3D 32) { - const uint16x8x2_t chunk =3D vld2q_u16((const U16*)(const void= *)src); - const uint8x16_t chunk0 =3D vreinterpretq_u8_u16(chunk.val[0]); - const uint8x16_t chunk1 =3D vreinterpretq_u8_u16(chunk.val[1]); - const uint8x16_t equalMask0 =3D vceqq_u8(chunk0, vdupq_n_u8(ta= g)); - const uint8x16_t equalMask1 =3D vceqq_u8(chunk1, vdupq_n_u8(ta= g)); - const int8x8_t pack0 =3D vqmovn_s16(vreinterpretq_s16_u8(equal= Mask0)); - const int8x8_t pack1 =3D vqmovn_s16(vreinterpretq_s16_u8(equal= Mask1)); - const uint8x8_t t0 =3D vreinterpret_u8_s8(pack0); - const uint8x8_t t1 =3D vreinterpret_u8_s8(pack1); - const uint8x8_t t2 =3D vsri_n_u8(t1, t0, 2); - const uint8x8x2_t t3 =3D vuzp_u8(t2, t0); - const uint8x8_t t4 =3D vsri_n_u8(t3.val[1], t3.val[0], 4); - const U32 matches =3D vget_lane_u32(vreinterpret_u32_u8(t4), 0= ); - return ZSTD_rotateRight_U32(matches, head); - } else { /* rowEntries =3D=3D 64 */ - const uint8x16x4_t chunk =3D vld4q_u8(src); - const uint8x16_t dup =3D vdupq_n_u8(tag); - const uint8x16_t cmp0 =3D vceqq_u8(chunk.val[0], dup); - const uint8x16_t cmp1 =3D vceqq_u8(chunk.val[1], dup); - const uint8x16_t cmp2 =3D vceqq_u8(chunk.val[2], dup); - const uint8x16_t cmp3 =3D vceqq_u8(chunk.val[3], dup); - - const uint8x16_t t0 =3D vsriq_n_u8(cmp1, cmp0, 1); - const uint8x16_t t1 =3D vsriq_n_u8(cmp3, cmp2, 1); - const uint8x16_t t2 =3D vsriq_n_u8(t1, t0, 2); - const uint8x16_t t3 =3D vsriq_n_u8(t2, t2, 4); - const uint8x8_t t4 =3D vshrn_n_u16(vreinterpretq_u16_u8(t3), 4= ); - const U64 matches =3D vget_lane_u64(vreinterpret_u64_u8(t4), 0= ); - return ZSTD_rotateRight_U64(matches, head); - } + return ZSTD_row_getNEONMask(rowEntries, src, tag, headGrouped); } # endif /* ZSTD_ARCH_ARM_NEON */ /* SWAR */ - { const size_t chunkSize =3D sizeof(size_t); + { const int chunkSize =3D sizeof(size_t); const size_t shiftAmount =3D ((chunkSize * 8) - chunkSize); const size_t xFF =3D ~((size_t)0); const size_t x01 =3D xFF / 0xFF; @@ -1091,11 +1112,11 @@ ZSTD_row_getMatchMask(const BYTE* const tagRow, con= st BYTE tag, const U32 head, } matches =3D ~matches; if (rowEntries =3D=3D 16) { - return ZSTD_rotateRight_U16((U16)matches, head); + return ZSTD_rotateRight_U16((U16)matches, headGrouped); } else if (rowEntries =3D=3D 32) { - return ZSTD_rotateRight_U32((U32)matches, head); + return ZSTD_rotateRight_U32((U32)matches, headGrouped); } else { - return ZSTD_rotateRight_U64((U64)matches, head); + return ZSTD_rotateRight_U64((U64)matches, headGrouped); } } #endif @@ -1103,29 +1124,30 @@ ZSTD_row_getMatchMask(const BYTE* const tagRow, con= st BYTE tag, const U32 head, =20 /* The high-level approach of the SIMD row based match finder is as follow= s: * - Figure out where to insert the new entry: - * - Generate a hash from a byte along with an additional 1-byte "sho= rt hash". The additional byte is our "tag" - * - The hashTable is effectively split into groups or "rows" of 16 o= r 32 entries of U32, and the hash determines + * - Generate a hash for current input position and split it into a o= ne byte of tag and `rowHashLog` bits of index. + * - The hash is salted by a value that changes on every context= reset, so when the same table is used + * we will avoid collisions that would otherwise slow us down = by introducing phantom matches. + * - The hashTable is effectively split into groups or "rows" of 15 o= r 31 entries of U32, and the index determines * which row to insert into. - * - Determine the correct position within the row to insert the entr= y into. Each row of 16 or 32 can - * be considered as a circular buffer with a "head" index that resi= des in the tagTable. - * - Also insert the "tag" into the equivalent row and position in th= e tagTable. - * - Note: The tagTable has 17 or 33 1-byte entries per row, due = to 16 or 32 tags, and 1 "head" entry. - * The 17 or 33 entry rows are spaced out to occur every = 32 or 64 bytes, respectively, - * for alignment/performance reasons, leaving some bytes = unused. - * - Use SIMD to efficiently compare the tags in the tagTable to the 1-byt= e "short hash" and + * - Determine the correct position within the row to insert the entr= y into. Each row of 15 or 31 can + * be considered as a circular buffer with a "head" index that resi= des in the tagTable (overall 16 or 32 bytes + * per row). + * - Use SIMD to efficiently compare the tags in the tagTable to the 1-byt= e tag calculated for the position and * generate a bitfield that we can cycle through to check the collisions= in the hash table. * - Pick the longest match. + * - Insert the tag into the equivalent row and position in the tagTable. */ FORCE_INLINE_TEMPLATE +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR size_t ZSTD_RowFindBestMatch( - ZSTD_matchState_t* ms, + ZSTD_MatchState_t* ms, const BYTE* const ip, const BYTE* const iLimit, size_t* offsetPtr, const U32 mls, const ZSTD_dictMode_e dictMode, const U32 rowLog) { U32* const hashTable =3D ms->hashTable; - U16* const tagTable =3D ms->tagTable; + BYTE* const tagTable =3D ms->tagTable; U32* const hashCache =3D ms->hashCache; const U32 hashLog =3D ms->rowHashLog; const ZSTD_compressionParameters* const cParams =3D &ms->cParams; @@ -1143,11 +1165,14 @@ size_t ZSTD_RowFindBestMatch( const U32 rowEntries =3D (1U << rowLog); const U32 rowMask =3D rowEntries - 1; const U32 cappedSearchLog =3D MIN(cParams->searchLog, rowLog); /* nb o= f searches is capped at nb entries per row */ + const U32 groupWidth =3D ZSTD_row_matchMaskGroupWidth(rowEntries); + const U64 hashSalt =3D ms->hashSalt; U32 nbAttempts =3D 1U << cappedSearchLog; size_t ml=3D4-1; + U32 hash; =20 /* DMS/DDS variables that may be referenced laster */ - const ZSTD_matchState_t* const dms =3D ms->dictMatchState; + const ZSTD_MatchState_t* const dms =3D ms->dictMatchState; =20 /* Initialize the following variables to satisfy static analyzer */ size_t ddsIdx =3D 0; @@ -1168,7 +1193,7 @@ size_t ZSTD_RowFindBestMatch( if (dictMode =3D=3D ZSTD_dictMatchState) { /* Prefetch DMS rows */ U32* const dmsHashTable =3D dms->hashTable; - U16* const dmsTagTable =3D dms->tagTable; + BYTE* const dmsTagTable =3D dms->tagTable; U32 const dmsHash =3D (U32)ZSTD_hashPtr(ip, dms->rowHashLog + ZSTD= _ROW_HASH_TAG_BITS, mls); U32 const dmsRelRow =3D (dmsHash >> ZSTD_ROW_HASH_TAG_BITS) << row= Log; dmsTag =3D dmsHash & ZSTD_ROW_HASH_TAG_MASK; @@ -1178,23 +1203,34 @@ size_t ZSTD_RowFindBestMatch( } =20 /* Update the hashTable and tagTable up to (but not including) ip */ - ZSTD_row_update_internal(ms, ip, mls, rowLog, rowMask, 1 /* useCache *= /); + if (!ms->lazySkipping) { + ZSTD_row_update_internal(ms, ip, mls, rowLog, rowMask, 1 /* useCac= he */); + hash =3D ZSTD_row_nextCachedHash(hashCache, hashTable, tagTable, b= ase, curr, hashLog, rowLog, mls, hashSalt); + } else { + /* Stop inserting every position when in the lazy skipping mode. + * The hash cache is also not kept up to date in this mode. + */ + hash =3D (U32)ZSTD_hashPtrSalted(ip, hashLog + ZSTD_ROW_HASH_TAG_B= ITS, mls, hashSalt); + ms->nextToUpdate =3D curr; + } + ms->hashSaltEntropy +=3D hash; /* collect salt entropy */ + { /* Get the hash for ip, compute the appropriate row */ - U32 const hash =3D ZSTD_row_nextCachedHash(hashCache, hashTable, t= agTable, base, curr, hashLog, rowLog, mls); U32 const relRow =3D (hash >> ZSTD_ROW_HASH_TAG_BITS) << rowLog; U32 const tag =3D hash & ZSTD_ROW_HASH_TAG_MASK; U32* const row =3D hashTable + relRow; BYTE* tagRow =3D (BYTE*)(tagTable + relRow); - U32 const head =3D *tagRow & rowMask; + U32 const headGrouped =3D (*tagRow & rowMask) * groupWidth; U32 matchBuffer[ZSTD_ROW_HASH_MAX_ENTRIES]; size_t numMatches =3D 0; size_t currMatch =3D 0; - ZSTD_VecMask matches =3D ZSTD_row_getMatchMask(tagRow, (BYTE)tag, = head, rowEntries); + ZSTD_VecMask matches =3D ZSTD_row_getMatchMask(tagRow, (BYTE)tag, = headGrouped, rowEntries); =20 /* Cycle through the matches and prefetch */ - for (; (matches > 0) && (nbAttempts > 0); --nbAttempts, matches &= =3D (matches - 1)) { - U32 const matchPos =3D (head + ZSTD_VecMask_next(matches)) & r= owMask; + for (; (matches > 0) && (nbAttempts > 0); matches &=3D (matches - = 1)) { + U32 const matchPos =3D ((headGrouped + ZSTD_VecMask_next(match= es)) / groupWidth) & rowMask; U32 const matchIndex =3D row[matchPos]; + if(matchPos =3D=3D 0) continue; assert(numMatches < rowEntries); if (matchIndex < lowLimit) break; @@ -1204,13 +1240,14 @@ size_t ZSTD_RowFindBestMatch( PREFETCH_L1(dictBase + matchIndex); } matchBuffer[numMatches++] =3D matchIndex; + --nbAttempts; } =20 /* Speed opt: insert current byte into hashtable too. This allows = us to avoid one iteration of the loop in ZSTD_row_update_internal() at the next search. */ { U32 const pos =3D ZSTD_row_nextIndex(tagRow, rowMask); - tagRow[pos + ZSTD_ROW_HASH_TAG_OFFSET] =3D (BYTE)tag; + tagRow[pos] =3D (BYTE)tag; row[pos] =3D ms->nextToUpdate++; } =20 @@ -1224,7 +1261,8 @@ size_t ZSTD_RowFindBestMatch( if ((dictMode !=3D ZSTD_extDict) || matchIndex >=3D dictLimit)= { const BYTE* const match =3D base + matchIndex; assert(matchIndex >=3D dictLimit); /* ensures this is tr= ue if dictMode !=3D ZSTD_extDict */ - if (match[ml] =3D=3D ip[ml]) /* potentially better */ + /* read 4B starting from (match + ml + 1 - sizeof(U32)) */ + if (MEM_read32(match + ml - 3) =3D=3D MEM_read32(ip + ml -= 3)) /* potentially better */ currentMl =3D ZSTD_count(ip, match, iLimit); } else { const BYTE* const match =3D dictBase + matchIndex; @@ -1236,7 +1274,7 @@ size_t ZSTD_RowFindBestMatch( /* Save best solution */ if (currentMl > ml) { ml =3D currentMl; - *offsetPtr =3D STORE_OFFSET(curr - matchIndex); + *offsetPtr =3D OFFSET_TO_OFFBASE(curr - matchIndex); if (ip+currentMl =3D=3D iLimit) break; /* best possible, a= voids read overflow on next attempt */ } } @@ -1254,19 +1292,21 @@ size_t ZSTD_RowFindBestMatch( const U32 dmsSize =3D (U32)(dmsEnd - dmsBase); const U32 dmsIndexDelta =3D dictLimit - dmsSize; =20 - { U32 const head =3D *dmsTagRow & rowMask; + { U32 const headGrouped =3D (*dmsTagRow & rowMask) * groupWidth; U32 matchBuffer[ZSTD_ROW_HASH_MAX_ENTRIES]; size_t numMatches =3D 0; size_t currMatch =3D 0; - ZSTD_VecMask matches =3D ZSTD_row_getMatchMask(dmsTagRow, (BYT= E)dmsTag, head, rowEntries); + ZSTD_VecMask matches =3D ZSTD_row_getMatchMask(dmsTagRow, (BYT= E)dmsTag, headGrouped, rowEntries); =20 - for (; (matches > 0) && (nbAttempts > 0); --nbAttempts, matche= s &=3D (matches - 1)) { - U32 const matchPos =3D (head + ZSTD_VecMask_next(matches))= & rowMask; + for (; (matches > 0) && (nbAttempts > 0); matches &=3D (matche= s - 1)) { + U32 const matchPos =3D ((headGrouped + ZSTD_VecMask_next(m= atches)) / groupWidth) & rowMask; U32 const matchIndex =3D dmsRow[matchPos]; + if(matchPos =3D=3D 0) continue; if (matchIndex < dmsLowestIndex) break; PREFETCH_L1(dmsBase + matchIndex); matchBuffer[numMatches++] =3D matchIndex; + --nbAttempts; } =20 /* Return the longest match */ @@ -1285,7 +1325,7 @@ size_t ZSTD_RowFindBestMatch( if (currentMl > ml) { ml =3D currentMl; assert(curr > matchIndex + dmsIndexDelta); - *offsetPtr =3D STORE_OFFSET(curr - (matchIndex + dmsIn= dexDelta)); + *offsetPtr =3D OFFSET_TO_OFFBASE(curr - (matchIndex + = dmsIndexDelta)); if (ip+currentMl =3D=3D iLimit) break; } } @@ -1301,7 +1341,7 @@ size_t ZSTD_RowFindBestMatch( * ZSTD_searchMax() dispatches to the correct implementation function. * * TODO: The start of the search function involves loading and calculating= a - * bunch of constants from the ZSTD_matchState_t. These computations could= be + * bunch of constants from the ZSTD_MatchState_t. These computations could= be * done in an initialization function, and saved somewhere in the match st= ate. * Then we could pass a pointer to the saved state instead of the match st= ate, * and avoid duplicate computations. @@ -1325,7 +1365,7 @@ size_t ZSTD_RowFindBestMatch( =20 #define GEN_ZSTD_BT_SEARCH_FN(dictMode, mls) = \ ZSTD_SEARCH_FN_ATTRS size_t ZSTD_BT_SEARCH_FN(dictMode, mls)( = \ - ZSTD_matchState_t* ms, = \ + ZSTD_MatchState_t* ms, = \ const BYTE* ip, const BYTE* const iLimit, = \ size_t* offBasePtr) = \ { = \ @@ -1335,7 +1375,7 @@ size_t ZSTD_RowFindBestMatch( =20 #define GEN_ZSTD_HC_SEARCH_FN(dictMode, mls) = \ ZSTD_SEARCH_FN_ATTRS size_t ZSTD_HC_SEARCH_FN(dictMode, mls)( = \ - ZSTD_matchState_t* ms, = \ + ZSTD_MatchState_t* ms, = \ const BYTE* ip, const BYTE* const iLimit, = \ size_t* offsetPtr) = \ { = \ @@ -1345,7 +1385,7 @@ size_t ZSTD_RowFindBestMatch( =20 #define GEN_ZSTD_ROW_SEARCH_FN(dictMode, mls, rowLog) = \ ZSTD_SEARCH_FN_ATTRS size_t ZSTD_ROW_SEARCH_FN(dictMode, mls, rowLog)(= \ - ZSTD_matchState_t* ms, = \ + ZSTD_MatchState_t* ms, = \ const BYTE* ip, const BYTE* const iLimit, = \ size_t* offsetPtr) = \ { = \ @@ -1446,7 +1486,7 @@ typedef enum { search_hashChain=3D0, search_binaryTre= e=3D1, search_rowHash=3D2 } searc * If a match is found its offset is stored in @p offsetPtr. */ FORCE_INLINE_TEMPLATE size_t ZSTD_searchMax( - ZSTD_matchState_t* ms, + ZSTD_MatchState_t* ms, const BYTE* ip, const BYTE* iend, size_t* offsetPtr, @@ -1472,9 +1512,10 @@ FORCE_INLINE_TEMPLATE size_t ZSTD_searchMax( * Common parser - lazy strategy *********************************/ =20 -FORCE_INLINE_TEMPLATE size_t -ZSTD_compressBlock_lazy_generic( - ZSTD_matchState_t* ms, seqStore_t* seqStore, +FORCE_INLINE_TEMPLATE +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR +size_t ZSTD_compressBlock_lazy_generic( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], const void* src, size_t srcSize, const searchMethod_e searchMethod, const U32 depth, @@ -1491,12 +1532,13 @@ ZSTD_compressBlock_lazy_generic( const U32 mls =3D BOUNDED(4, ms->cParams.minMatch, 6); const U32 rowLog =3D BOUNDED(4, ms->cParams.searchLog, 6); =20 - U32 offset_1 =3D rep[0], offset_2 =3D rep[1], savedOffset=3D0; + U32 offset_1 =3D rep[0], offset_2 =3D rep[1]; + U32 offsetSaved1 =3D 0, offsetSaved2 =3D 0; =20 const int isDMS =3D dictMode =3D=3D ZSTD_dictMatchState; const int isDDS =3D dictMode =3D=3D ZSTD_dedicatedDictSearch; const int isDxS =3D isDMS || isDDS; - const ZSTD_matchState_t* const dms =3D ms->dictMatchState; + const ZSTD_MatchState_t* const dms =3D ms->dictMatchState; const U32 dictLowestIndex =3D isDxS ? dms->window.dictLimit : 0; const BYTE* const dictBase =3D isDxS ? dms->window.base : NULL; const BYTE* const dictLowest =3D isDxS ? dictBase + dictLowestIndex = : NULL; @@ -1512,8 +1554,8 @@ ZSTD_compressBlock_lazy_generic( U32 const curr =3D (U32)(ip - base); U32 const windowLow =3D ZSTD_getLowestPrefixIndex(ms, curr, ms->cP= arams.windowLog); U32 const maxRep =3D curr - windowLow; - if (offset_2 > maxRep) savedOffset =3D offset_2, offset_2 =3D 0; - if (offset_1 > maxRep) savedOffset =3D offset_1, offset_1 =3D 0; + if (offset_2 > maxRep) offsetSaved2 =3D offset_2, offset_2 =3D 0; + if (offset_1 > maxRep) offsetSaved1 =3D offset_1, offset_1 =3D 0; } if (isDxS) { /* dictMatchState repCode checks don't currently handle repCode = =3D=3D 0 @@ -1522,10 +1564,11 @@ ZSTD_compressBlock_lazy_generic( assert(offset_2 <=3D dictAndPrefixLength); } =20 + /* Reset the lazy skipping state */ + ms->lazySkipping =3D 0; + if (searchMethod =3D=3D search_rowHash) { - ZSTD_row_fillHashCache(ms, base, rowLog, - MIN(ms->cParams.minMatch, 6 /* mls caps out at= 6 */), - ms->nextToUpdate, ilimit); + ZSTD_row_fillHashCache(ms, base, rowLog, mls, ms->nextToUpdate, il= imit); } =20 /* Match Loop */ @@ -1537,7 +1580,7 @@ ZSTD_compressBlock_lazy_generic( #endif while (ip < ilimit) { size_t matchLength=3D0; - size_t offcode=3DSTORE_REPCODE_1; + size_t offBase =3D REPCODE1_TO_OFFBASE; const BYTE* start=3Dip+1; DEBUGLOG(7, "search baseline (depth 0)"); =20 @@ -1548,7 +1591,7 @@ ZSTD_compressBlock_lazy_generic( && repIndex < prefixLowestIndex) ? dictBase + (repIndex - dictIndexDelta) : base + repIndex; - if (((U32)((prefixLowestIndex-1) - repIndex) >=3D 3 /* intenti= onal underflow */) + if ((ZSTD_index_overlap_check(prefixLowestIndex, repIndex)) && (MEM_read32(repMatch) =3D=3D MEM_read32(ip+1)) ) { const BYTE* repMatchEnd =3D repIndex < prefixLowestIndex ?= dictEnd : iend; matchLength =3D ZSTD_count_2segments(ip+1+4, repMatch+4, i= end, repMatchEnd, prefixLowest) + 4; @@ -1562,14 +1605,23 @@ ZSTD_compressBlock_lazy_generic( } =20 /* first search (depth 0) */ - { size_t offsetFound =3D 999999999; - size_t const ml2 =3D ZSTD_searchMax(ms, ip, iend, &offsetFound= , mls, rowLog, searchMethod, dictMode); + { size_t offbaseFound =3D 999999999; + size_t const ml2 =3D ZSTD_searchMax(ms, ip, iend, &offbaseFoun= d, mls, rowLog, searchMethod, dictMode); if (ml2 > matchLength) - matchLength =3D ml2, start =3D ip, offcode=3DoffsetFound; + matchLength =3D ml2, start =3D ip, offBase =3D offbaseFoun= d; } =20 if (matchLength < 4) { - ip +=3D ((ip-anchor) >> kSearchStrength) + 1; /* jump faster= over incompressible sections */ + size_t const step =3D ((size_t)(ip-anchor) >> kSearchStrength)= + 1; /* jump faster over incompressible sections */; + ip +=3D step; + /* Enter the lazy skipping mode once we are skipping more than= 8 bytes at a time. + * In this mode we stop inserting every position into our tabl= es, and only insert + * positions that we search, which is one in step positions. + * The exact cutoff is flexible, I've just chosen a number tha= t is reasonably high, + * so we minimize the compression ratio loss in "normal" scena= rios. This mode gets + * triggered once we've gone 2KB without finding any matches. + */ + ms->lazySkipping =3D step > kLazySkippingStep; continue; } =20 @@ -1579,34 +1631,34 @@ ZSTD_compressBlock_lazy_generic( DEBUGLOG(7, "search depth 1"); ip ++; if ( (dictMode =3D=3D ZSTD_noDict) - && (offcode) && ((offset_1>0) & (MEM_read32(ip) =3D=3D MEM_r= ead32(ip - offset_1)))) { + && (offBase) && ((offset_1>0) & (MEM_read32(ip) =3D=3D MEM_r= ead32(ip - offset_1)))) { size_t const mlRep =3D ZSTD_count(ip+4, ip+4-offset_1, ien= d) + 4; int const gain2 =3D (int)(mlRep * 3); - int const gain1 =3D (int)(matchLength*3 - ZSTD_highbit32((= U32)STORED_TO_OFFBASE(offcode)) + 1); + int const gain1 =3D (int)(matchLength*3 - ZSTD_highbit32((= U32)offBase) + 1); if ((mlRep >=3D 4) && (gain2 > gain1)) - matchLength =3D mlRep, offcode =3D STORE_REPCODE_1, st= art =3D ip; + matchLength =3D mlRep, offBase =3D REPCODE1_TO_OFFBASE= , start =3D ip; } if (isDxS) { const U32 repIndex =3D (U32)(ip - base) - offset_1; const BYTE* repMatch =3D repIndex < prefixLowestIndex ? dictBase + (repIndex - dictIndexDelta) : base + repIndex; - if (((U32)((prefixLowestIndex-1) - repIndex) >=3D 3 /* int= entional underflow */) + if ((ZSTD_index_overlap_check(prefixLowestIndex, repIndex)) && (MEM_read32(repMatch) =3D=3D MEM_read32(ip)) ) { const BYTE* repMatchEnd =3D repIndex < prefixLowestInd= ex ? dictEnd : iend; size_t const mlRep =3D ZSTD_count_2segments(ip+4, repM= atch+4, iend, repMatchEnd, prefixLowest) + 4; int const gain2 =3D (int)(mlRep * 3); - int const gain1 =3D (int)(matchLength*3 - ZSTD_highbit= 32((U32)STORED_TO_OFFBASE(offcode)) + 1); + int const gain1 =3D (int)(matchLength*3 - ZSTD_highbit= 32((U32)offBase) + 1); if ((mlRep >=3D 4) && (gain2 > gain1)) - matchLength =3D mlRep, offcode =3D STORE_REPCODE_1= , start =3D ip; + matchLength =3D mlRep, offBase =3D REPCODE1_TO_OFF= BASE, start =3D ip; } } - { size_t offset2=3D999999999; - size_t const ml2 =3D ZSTD_searchMax(ms, ip, iend, &offset2= , mls, rowLog, searchMethod, dictMode); - int const gain2 =3D (int)(ml2*4 - ZSTD_highbit32((U32)STOR= ED_TO_OFFBASE(offset2))); /* raw approx */ - int const gain1 =3D (int)(matchLength*4 - ZSTD_highbit32((= U32)STORED_TO_OFFBASE(offcode)) + 4); + { size_t ofbCandidate=3D999999999; + size_t const ml2 =3D ZSTD_searchMax(ms, ip, iend, &ofbCand= idate, mls, rowLog, searchMethod, dictMode); + int const gain2 =3D (int)(ml2*4 - ZSTD_highbit32((U32)ofbC= andidate)); /* raw approx */ + int const gain1 =3D (int)(matchLength*4 - ZSTD_highbit32((= U32)offBase) + 4); if ((ml2 >=3D 4) && (gain2 > gain1)) { - matchLength =3D ml2, offcode =3D offset2, start =3D ip; + matchLength =3D ml2, offBase =3D ofbCandidate, start = =3D ip; continue; /* search a better one */ } } =20 @@ -1615,34 +1667,34 @@ ZSTD_compressBlock_lazy_generic( DEBUGLOG(7, "search depth 2"); ip ++; if ( (dictMode =3D=3D ZSTD_noDict) - && (offcode) && ((offset_1>0) & (MEM_read32(ip) =3D=3D M= EM_read32(ip - offset_1)))) { + && (offBase) && ((offset_1>0) & (MEM_read32(ip) =3D=3D M= EM_read32(ip - offset_1)))) { size_t const mlRep =3D ZSTD_count(ip+4, ip+4-offset_1,= iend) + 4; int const gain2 =3D (int)(mlRep * 4); - int const gain1 =3D (int)(matchLength*4 - ZSTD_highbit= 32((U32)STORED_TO_OFFBASE(offcode)) + 1); + int const gain1 =3D (int)(matchLength*4 - ZSTD_highbit= 32((U32)offBase) + 1); if ((mlRep >=3D 4) && (gain2 > gain1)) - matchLength =3D mlRep, offcode =3D STORE_REPCODE_1= , start =3D ip; + matchLength =3D mlRep, offBase =3D REPCODE1_TO_OFF= BASE, start =3D ip; } if (isDxS) { const U32 repIndex =3D (U32)(ip - base) - offset_1; const BYTE* repMatch =3D repIndex < prefixLowestIndex ? dictBase + (repIndex - dictIndexDelta) : base + repIndex; - if (((U32)((prefixLowestIndex-1) - repIndex) >=3D 3 /*= intentional underflow */) + if ((ZSTD_index_overlap_check(prefixLowestIndex, repIn= dex)) && (MEM_read32(repMatch) =3D=3D MEM_read32(ip)) ) { const BYTE* repMatchEnd =3D repIndex < prefixLowes= tIndex ? dictEnd : iend; size_t const mlRep =3D ZSTD_count_2segments(ip+4, = repMatch+4, iend, repMatchEnd, prefixLowest) + 4; int const gain2 =3D (int)(mlRep * 4); - int const gain1 =3D (int)(matchLength*4 - ZSTD_hig= hbit32((U32)STORED_TO_OFFBASE(offcode)) + 1); + int const gain1 =3D (int)(matchLength*4 - ZSTD_hig= hbit32((U32)offBase) + 1); if ((mlRep >=3D 4) && (gain2 > gain1)) - matchLength =3D mlRep, offcode =3D STORE_REPCO= DE_1, start =3D ip; + matchLength =3D mlRep, offBase =3D REPCODE1_TO= _OFFBASE, start =3D ip; } } - { size_t offset2=3D999999999; - size_t const ml2 =3D ZSTD_searchMax(ms, ip, iend, &off= set2, mls, rowLog, searchMethod, dictMode); - int const gain2 =3D (int)(ml2*4 - ZSTD_highbit32((U32)= STORED_TO_OFFBASE(offset2))); /* raw approx */ - int const gain1 =3D (int)(matchLength*4 - ZSTD_highbit= 32((U32)STORED_TO_OFFBASE(offcode)) + 7); + { size_t ofbCandidate=3D999999999; + size_t const ml2 =3D ZSTD_searchMax(ms, ip, iend, &ofb= Candidate, mls, rowLog, searchMethod, dictMode); + int const gain2 =3D (int)(ml2*4 - ZSTD_highbit32((U32)= ofbCandidate)); /* raw approx */ + int const gain1 =3D (int)(matchLength*4 - ZSTD_highbit= 32((U32)offBase) + 7); if ((ml2 >=3D 4) && (gain2 > gain1)) { - matchLength =3D ml2, offcode =3D offset2, start = =3D ip; + matchLength =3D ml2, offBase =3D ofbCandidate, sta= rt =3D ip; continue; } } } break; /* nothing found : store previous solution */ @@ -1653,26 +1705,33 @@ ZSTD_compressBlock_lazy_generic( * notably if `value` is unsigned, resulting in a large positive `= -value`. */ /* catch up */ - if (STORED_IS_OFFSET(offcode)) { + if (OFFBASE_IS_OFFSET(offBase)) { if (dictMode =3D=3D ZSTD_noDict) { - while ( ((start > anchor) & (start - STORED_OFFSET(offcode= ) > prefixLowest)) - && (start[-1] =3D=3D (start-STORED_OFFSET(offcode))[-= 1]) ) /* only search for offset within prefix */ + while ( ((start > anchor) & (start - OFFBASE_TO_OFFSET(off= Base) > prefixLowest)) + && (start[-1] =3D=3D (start-OFFBASE_TO_OFFSET(offBase= ))[-1]) ) /* only search for offset within prefix */ { start--; matchLength++; } } if (isDxS) { - U32 const matchIndex =3D (U32)((size_t)(start-base) - STOR= ED_OFFSET(offcode)); + U32 const matchIndex =3D (U32)((size_t)(start-base) - OFFB= ASE_TO_OFFSET(offBase)); const BYTE* match =3D (matchIndex < prefixLowestIndex) ? d= ictBase + matchIndex - dictIndexDelta : base + matchIndex; const BYTE* const mStart =3D (matchIndex < prefixLowestInd= ex) ? dictLowest : prefixLowest; while ((start>anchor) && (match>mStart) && (start[-1] =3D= =3D match[-1])) { start--; match--; matchLength++; } /* catch up */ } - offset_2 =3D offset_1; offset_1 =3D (U32)STORED_OFFSET(offcode= ); + offset_2 =3D offset_1; offset_1 =3D (U32)OFFBASE_TO_OFFSET(off= Base); } /* store sequence */ _storeSequence: { size_t const litLength =3D (size_t)(start - anchor); - ZSTD_storeSeq(seqStore, litLength, anchor, iend, (U32)offcode,= matchLength); + ZSTD_storeSeq(seqStore, litLength, anchor, iend, (U32)offBase,= matchLength); anchor =3D ip =3D start + matchLength; } + if (ms->lazySkipping) { + /* We've found a match, disable lazy skipping mode, and refill= the hash cache. */ + if (searchMethod =3D=3D search_rowHash) { + ZSTD_row_fillHashCache(ms, base, rowLog, mls, ms->nextToUp= date, ilimit); + } + ms->lazySkipping =3D 0; + } =20 /* check immediate repcode */ if (isDxS) { @@ -1682,12 +1741,12 @@ ZSTD_compressBlock_lazy_generic( const BYTE* repMatch =3D repIndex < prefixLowestIndex ? dictBase - dictIndexDelta + repIndex : base + repIndex; - if ( ((U32)((prefixLowestIndex-1) - (U32)repIndex) >=3D 3 = /* intentional overflow */) + if ( (ZSTD_index_overlap_check(prefixLowestIndex, repIndex= )) && (MEM_read32(repMatch) =3D=3D MEM_read32(ip)) ) { const BYTE* const repEnd2 =3D repIndex < prefixLowestI= ndex ? dictEnd : iend; matchLength =3D ZSTD_count_2segments(ip+4, repMatch+4,= iend, repEnd2, prefixLowest) + 4; - offcode =3D offset_2; offset_2 =3D offset_1; offset_1 = =3D (U32)offcode; /* swap offset_2 <=3D> offset_1 */ - ZSTD_storeSeq(seqStore, 0, anchor, iend, STORE_REPCODE= _1, matchLength); + offBase =3D offset_2; offset_2 =3D offset_1; offset_1 = =3D (U32)offBase; /* swap offset_2 <=3D> offset_1 */ + ZSTD_storeSeq(seqStore, 0, anchor, iend, REPCODE1_TO_O= FFBASE, matchLength); ip +=3D matchLength; anchor =3D ip; continue; @@ -1701,168 +1760,183 @@ ZSTD_compressBlock_lazy_generic( && (MEM_read32(ip) =3D=3D MEM_read32(ip - offset_2)) ) { /* store sequence */ matchLength =3D ZSTD_count(ip+4, ip+4-offset_2, iend) + 4; - offcode =3D offset_2; offset_2 =3D offset_1; offset_1 =3D = (U32)offcode; /* swap repcodes */ - ZSTD_storeSeq(seqStore, 0, anchor, iend, STORE_REPCODE_1, = matchLength); + offBase =3D offset_2; offset_2 =3D offset_1; offset_1 =3D = (U32)offBase; /* swap repcodes */ + ZSTD_storeSeq(seqStore, 0, anchor, iend, REPCODE1_TO_OFFBA= SE, matchLength); ip +=3D matchLength; anchor =3D ip; continue; /* faster when present ... (?) */ } } } =20 - /* Save reps for next block */ - rep[0] =3D offset_1 ? offset_1 : savedOffset; - rep[1] =3D offset_2 ? offset_2 : savedOffset; + /* If offset_1 started invalid (offsetSaved1 !=3D 0) and became valid = (offset_1 !=3D 0), + * rotate saved offsets. See comment in ZSTD_compressBlock_fast_noDict= for more context. */ + offsetSaved2 =3D ((offsetSaved1 !=3D 0) && (offset_1 !=3D 0)) ? offset= Saved1 : offsetSaved2; + + /* save reps for next block */ + rep[0] =3D offset_1 ? offset_1 : offsetSaved1; + rep[1] =3D offset_2 ? offset_2 : offsetSaved2; =20 /* Return the last literals size */ return (size_t)(iend - anchor); } +#endif /* build exclusions */ =20 =20 -size_t ZSTD_compressBlock_btlazy2( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +#ifndef ZSTD_EXCLUDE_GREEDY_BLOCK_COMPRESSOR +size_t ZSTD_compressBlock_greedy( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize) { - return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_binaryTree, 2, ZSTD_noDict); + return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_hashChain, 0, ZSTD_noDict); } =20 -size_t ZSTD_compressBlock_lazy2( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_greedy_dictMatchState( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize) { - return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_hashChain, 2, ZSTD_noDict); + return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_hashChain, 0, ZSTD_dictMatchState); } =20 -size_t ZSTD_compressBlock_lazy( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_greedy_dedicatedDictSearch( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize) { - return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_hashChain, 1, ZSTD_noDict); + return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_hashChain, 0, ZSTD_dedicatedDictSearch); } =20 -size_t ZSTD_compressBlock_greedy( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_greedy_row( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize) { - return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_hashChain, 0, ZSTD_noDict); + return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_rowHash, 0, ZSTD_noDict); } =20 -size_t ZSTD_compressBlock_btlazy2_dictMatchState( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_greedy_dictMatchState_row( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize) { - return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_binaryTree, 2, ZSTD_dictMatchState); + return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_rowHash, 0, ZSTD_dictMatchState); } =20 -size_t ZSTD_compressBlock_lazy2_dictMatchState( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_greedy_dedicatedDictSearch_row( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize) { - return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_hashChain, 2, ZSTD_dictMatchState); + return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_rowHash, 0, ZSTD_dedicatedDictSearch); } +#endif =20 -size_t ZSTD_compressBlock_lazy_dictMatchState( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +#ifndef ZSTD_EXCLUDE_LAZY_BLOCK_COMPRESSOR +size_t ZSTD_compressBlock_lazy( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize) { - return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_hashChain, 1, ZSTD_dictMatchState); + return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_hashChain, 1, ZSTD_noDict); } =20 -size_t ZSTD_compressBlock_greedy_dictMatchState( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_lazy_dictMatchState( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize) { - return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_hashChain, 0, ZSTD_dictMatchState); + return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_hashChain, 1, ZSTD_dictMatchState); } =20 - -size_t ZSTD_compressBlock_lazy2_dedicatedDictSearch( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_lazy_dedicatedDictSearch( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize) { - return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_hashChain, 2, ZSTD_dedicatedDictSearch); + return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_hashChain, 1, ZSTD_dedicatedDictSearch); } =20 -size_t ZSTD_compressBlock_lazy_dedicatedDictSearch( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_lazy_row( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize) { - return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_hashChain, 1, ZSTD_dedicatedDictSearch); + return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_rowHash, 1, ZSTD_noDict); } =20 -size_t ZSTD_compressBlock_greedy_dedicatedDictSearch( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_lazy_dictMatchState_row( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize) { - return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_hashChain, 0, ZSTD_dedicatedDictSearch); + return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_rowHash, 1, ZSTD_dictMatchState); } =20 -/* Row-based matchfinder */ -size_t ZSTD_compressBlock_lazy2_row( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_lazy_dedicatedDictSearch_row( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize) { - return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_rowHash, 2, ZSTD_noDict); + return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_rowHash, 1, ZSTD_dedicatedDictSearch); } +#endif =20 -size_t ZSTD_compressBlock_lazy_row( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +#ifndef ZSTD_EXCLUDE_LAZY2_BLOCK_COMPRESSOR +size_t ZSTD_compressBlock_lazy2( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize) { - return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_rowHash, 1, ZSTD_noDict); + return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_hashChain, 2, ZSTD_noDict); } =20 -size_t ZSTD_compressBlock_greedy_row( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_lazy2_dictMatchState( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize) { - return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_rowHash, 0, ZSTD_noDict); + return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_hashChain, 2, ZSTD_dictMatchState); } =20 -size_t ZSTD_compressBlock_lazy2_dictMatchState_row( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_lazy2_dedicatedDictSearch( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize) { - return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_rowHash, 2, ZSTD_dictMatchState); + return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_hashChain, 2, ZSTD_dedicatedDictSearch); } =20 -size_t ZSTD_compressBlock_lazy_dictMatchState_row( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_lazy2_row( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize) { - return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_rowHash, 1, ZSTD_dictMatchState); + return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_rowHash, 2, ZSTD_noDict); } =20 -size_t ZSTD_compressBlock_greedy_dictMatchState_row( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_lazy2_dictMatchState_row( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize) { - return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_rowHash, 0, ZSTD_dictMatchState); + return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_rowHash, 2, ZSTD_dictMatchState); } =20 - size_t ZSTD_compressBlock_lazy2_dedicatedDictSearch_row( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize) { return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_rowHash, 2, ZSTD_dedicatedDictSearch); } +#endif =20 -size_t ZSTD_compressBlock_lazy_dedicatedDictSearch_row( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +#ifndef ZSTD_EXCLUDE_BTLAZY2_BLOCK_COMPRESSOR +size_t ZSTD_compressBlock_btlazy2( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize) { - return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_rowHash, 1, ZSTD_dedicatedDictSearch); + return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_binaryTree, 2, ZSTD_noDict); } =20 -size_t ZSTD_compressBlock_greedy_dedicatedDictSearch_row( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_btlazy2_dictMatchState( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize) { - return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_rowHash, 0, ZSTD_dedicatedDictSearch); + return ZSTD_compressBlock_lazy_generic(ms, seqStore, rep, src, srcSize= , search_binaryTree, 2, ZSTD_dictMatchState); } +#endif =20 +#if !defined(ZSTD_EXCLUDE_GREEDY_BLOCK_COMPRESSOR) \ + || !defined(ZSTD_EXCLUDE_LAZY_BLOCK_COMPRESSOR) \ + || !defined(ZSTD_EXCLUDE_LAZY2_BLOCK_COMPRESSOR) \ + || !defined(ZSTD_EXCLUDE_BTLAZY2_BLOCK_COMPRESSOR) FORCE_INLINE_TEMPLATE +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR size_t ZSTD_compressBlock_lazy_extDict_generic( - ZSTD_matchState_t* ms, seqStore_t* seqStore, + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], const void* src, size_t srcSize, const searchMethod_e searchMethod, const U32 depth) @@ -1886,12 +1960,13 @@ size_t ZSTD_compressBlock_lazy_extDict_generic( =20 DEBUGLOG(5, "ZSTD_compressBlock_lazy_extDict_generic (searchFunc=3D%u)= ", (U32)searchMethod); =20 + /* Reset the lazy skipping state */ + ms->lazySkipping =3D 0; + /* init */ ip +=3D (ip =3D=3D prefixStart); if (searchMethod =3D=3D search_rowHash) { - ZSTD_row_fillHashCache(ms, base, rowLog, - MIN(ms->cParams.minMatch, 6 /* mls caps out= at 6 */), - ms->nextToUpdate, ilimit); + ZSTD_row_fillHashCache(ms, base, rowLog, mls, ms->nextToUpdate, il= imit); } =20 /* Match Loop */ @@ -1903,7 +1978,7 @@ size_t ZSTD_compressBlock_lazy_extDict_generic( #endif while (ip < ilimit) { size_t matchLength=3D0; - size_t offcode=3DSTORE_REPCODE_1; + size_t offBase =3D REPCODE1_TO_OFFBASE; const BYTE* start=3Dip+1; U32 curr =3D (U32)(ip-base); =20 @@ -1912,7 +1987,7 @@ size_t ZSTD_compressBlock_lazy_extDict_generic( const U32 repIndex =3D (U32)(curr+1 - offset_1); const BYTE* const repBase =3D repIndex < dictLimit ? dictBase = : base; const BYTE* const repMatch =3D repBase + repIndex; - if ( ((U32)((dictLimit-1) - repIndex) >=3D 3) /* intentional o= verflow */ + if ( (ZSTD_index_overlap_check(dictLimit, repIndex)) & (offset_1 <=3D curr+1 - windowLow) ) /* note: we are sear= ching at curr+1 */ if (MEM_read32(ip+1) =3D=3D MEM_read32(repMatch)) { /* repcode detected we should take it */ @@ -1922,14 +1997,23 @@ size_t ZSTD_compressBlock_lazy_extDict_generic( } } =20 /* first search (depth 0) */ - { size_t offsetFound =3D 999999999; - size_t const ml2 =3D ZSTD_searchMax(ms, ip, iend, &offsetFound= , mls, rowLog, searchMethod, ZSTD_extDict); + { size_t ofbCandidate =3D 999999999; + size_t const ml2 =3D ZSTD_searchMax(ms, ip, iend, &ofbCandidat= e, mls, rowLog, searchMethod, ZSTD_extDict); if (ml2 > matchLength) - matchLength =3D ml2, start =3D ip, offcode=3DoffsetFound; + matchLength =3D ml2, start =3D ip, offBase =3D ofbCandidat= e; } =20 if (matchLength < 4) { - ip +=3D ((ip-anchor) >> kSearchStrength) + 1; /* jump faster= over incompressible sections */ + size_t const step =3D ((size_t)(ip-anchor) >> kSearchStrength); + ip +=3D step + 1; /* jump faster over incompressible section= s */ + /* Enter the lazy skipping mode once we are skipping more than= 8 bytes at a time. + * In this mode we stop inserting every position into our tabl= es, and only insert + * positions that we search, which is one in step positions. + * The exact cutoff is flexible, I've just chosen a number tha= t is reasonably high, + * so we minimize the compression ratio loss in "normal" scena= rios. This mode gets + * triggered once we've gone 2KB without finding any matches. + */ + ms->lazySkipping =3D step > kLazySkippingStep; continue; } =20 @@ -1939,30 +2023,30 @@ size_t ZSTD_compressBlock_lazy_extDict_generic( ip ++; curr++; /* check repCode */ - if (offcode) { + if (offBase) { const U32 windowLow =3D ZSTD_getLowestMatchIndex(ms, curr,= windowLog); const U32 repIndex =3D (U32)(curr - offset_1); const BYTE* const repBase =3D repIndex < dictLimit ? dictB= ase : base; const BYTE* const repMatch =3D repBase + repIndex; - if ( ((U32)((dictLimit-1) - repIndex) >=3D 3) /* intention= al overflow : do not test positions overlapping 2 memory segments */ + if ( (ZSTD_index_overlap_check(dictLimit, repIndex)) & (offset_1 <=3D curr - windowLow) ) /* equivalent to `= curr > repIndex >=3D windowLow` */ if (MEM_read32(ip) =3D=3D MEM_read32(repMatch)) { /* repcode detected */ const BYTE* const repEnd =3D repIndex < dictLimit ? di= ctEnd : iend; size_t const repLength =3D ZSTD_count_2segments(ip+4, = repMatch+4, iend, repEnd, prefixStart) + 4; int const gain2 =3D (int)(repLength * 3); - int const gain1 =3D (int)(matchLength*3 - ZSTD_highbit= 32((U32)STORED_TO_OFFBASE(offcode)) + 1); + int const gain1 =3D (int)(matchLength*3 - ZSTD_highbit= 32((U32)offBase) + 1); if ((repLength >=3D 4) && (gain2 > gain1)) - matchLength =3D repLength, offcode =3D STORE_REPCO= DE_1, start =3D ip; + matchLength =3D repLength, offBase =3D REPCODE1_TO= _OFFBASE, start =3D ip; } } =20 /* search match, depth 1 */ - { size_t offset2=3D999999999; - size_t const ml2 =3D ZSTD_searchMax(ms, ip, iend, &offset2= , mls, rowLog, searchMethod, ZSTD_extDict); - int const gain2 =3D (int)(ml2*4 - ZSTD_highbit32((U32)STOR= ED_TO_OFFBASE(offset2))); /* raw approx */ - int const gain1 =3D (int)(matchLength*4 - ZSTD_highbit32((= U32)STORED_TO_OFFBASE(offcode)) + 4); + { size_t ofbCandidate =3D 999999999; + size_t const ml2 =3D ZSTD_searchMax(ms, ip, iend, &ofbCand= idate, mls, rowLog, searchMethod, ZSTD_extDict); + int const gain2 =3D (int)(ml2*4 - ZSTD_highbit32((U32)ofbC= andidate)); /* raw approx */ + int const gain1 =3D (int)(matchLength*4 - ZSTD_highbit32((= U32)offBase) + 4); if ((ml2 >=3D 4) && (gain2 > gain1)) { - matchLength =3D ml2, offcode =3D offset2, start =3D ip; + matchLength =3D ml2, offBase =3D ofbCandidate, start = =3D ip; continue; /* search a better one */ } } =20 @@ -1971,50 +2055,57 @@ size_t ZSTD_compressBlock_lazy_extDict_generic( ip ++; curr++; /* check repCode */ - if (offcode) { + if (offBase) { const U32 windowLow =3D ZSTD_getLowestMatchIndex(ms, c= urr, windowLog); const U32 repIndex =3D (U32)(curr - offset_1); const BYTE* const repBase =3D repIndex < dictLimit ? d= ictBase : base; const BYTE* const repMatch =3D repBase + repIndex; - if ( ((U32)((dictLimit-1) - repIndex) >=3D 3) /* inten= tional overflow : do not test positions overlapping 2 memory segments */ + if ( (ZSTD_index_overlap_check(dictLimit, repIndex)) & (offset_1 <=3D curr - windowLow) ) /* equivalent = to `curr > repIndex >=3D windowLow` */ if (MEM_read32(ip) =3D=3D MEM_read32(repMatch)) { /* repcode detected */ const BYTE* const repEnd =3D repIndex < dictLimit = ? dictEnd : iend; size_t const repLength =3D ZSTD_count_2segments(ip= +4, repMatch+4, iend, repEnd, prefixStart) + 4; int const gain2 =3D (int)(repLength * 4); - int const gain1 =3D (int)(matchLength*4 - ZSTD_hig= hbit32((U32)STORED_TO_OFFBASE(offcode)) + 1); + int const gain1 =3D (int)(matchLength*4 - ZSTD_hig= hbit32((U32)offBase) + 1); if ((repLength >=3D 4) && (gain2 > gain1)) - matchLength =3D repLength, offcode =3D STORE_R= EPCODE_1, start =3D ip; + matchLength =3D repLength, offBase =3D REPCODE= 1_TO_OFFBASE, start =3D ip; } } =20 /* search match, depth 2 */ - { size_t offset2=3D999999999; - size_t const ml2 =3D ZSTD_searchMax(ms, ip, iend, &off= set2, mls, rowLog, searchMethod, ZSTD_extDict); - int const gain2 =3D (int)(ml2*4 - ZSTD_highbit32((U32)= STORED_TO_OFFBASE(offset2))); /* raw approx */ - int const gain1 =3D (int)(matchLength*4 - ZSTD_highbit= 32((U32)STORED_TO_OFFBASE(offcode)) + 7); + { size_t ofbCandidate =3D 999999999; + size_t const ml2 =3D ZSTD_searchMax(ms, ip, iend, &ofb= Candidate, mls, rowLog, searchMethod, ZSTD_extDict); + int const gain2 =3D (int)(ml2*4 - ZSTD_highbit32((U32)= ofbCandidate)); /* raw approx */ + int const gain1 =3D (int)(matchLength*4 - ZSTD_highbit= 32((U32)offBase) + 7); if ((ml2 >=3D 4) && (gain2 > gain1)) { - matchLength =3D ml2, offcode =3D offset2, start = =3D ip; + matchLength =3D ml2, offBase =3D ofbCandidate, sta= rt =3D ip; continue; } } } break; /* nothing found : store previous solution */ } =20 /* catch up */ - if (STORED_IS_OFFSET(offcode)) { - U32 const matchIndex =3D (U32)((size_t)(start-base) - STORED_O= FFSET(offcode)); + if (OFFBASE_IS_OFFSET(offBase)) { + U32 const matchIndex =3D (U32)((size_t)(start-base) - OFFBASE_= TO_OFFSET(offBase)); const BYTE* match =3D (matchIndex < dictLimit) ? dictBase + ma= tchIndex : base + matchIndex; const BYTE* const mStart =3D (matchIndex < dictLimit) ? dictSt= art : prefixStart; while ((start>anchor) && (match>mStart) && (start[-1] =3D=3D m= atch[-1])) { start--; match--; matchLength++; } /* catch up */ - offset_2 =3D offset_1; offset_1 =3D (U32)STORED_OFFSET(offcode= ); + offset_2 =3D offset_1; offset_1 =3D (U32)OFFBASE_TO_OFFSET(off= Base); } =20 /* store sequence */ _storeSequence: { size_t const litLength =3D (size_t)(start - anchor); - ZSTD_storeSeq(seqStore, litLength, anchor, iend, (U32)offcode,= matchLength); + ZSTD_storeSeq(seqStore, litLength, anchor, iend, (U32)offBase,= matchLength); anchor =3D ip =3D start + matchLength; } + if (ms->lazySkipping) { + /* We've found a match, disable lazy skipping mode, and refill= the hash cache. */ + if (searchMethod =3D=3D search_rowHash) { + ZSTD_row_fillHashCache(ms, base, rowLog, mls, ms->nextToUp= date, ilimit); + } + ms->lazySkipping =3D 0; + } =20 /* check immediate repcode */ while (ip <=3D ilimit) { @@ -2023,14 +2114,14 @@ size_t ZSTD_compressBlock_lazy_extDict_generic( const U32 repIndex =3D repCurrent - offset_2; const BYTE* const repBase =3D repIndex < dictLimit ? dictBase = : base; const BYTE* const repMatch =3D repBase + repIndex; - if ( ((U32)((dictLimit-1) - repIndex) >=3D 3) /* intentional o= verflow : do not test positions overlapping 2 memory segments */ + if ( (ZSTD_index_overlap_check(dictLimit, repIndex)) & (offset_2 <=3D repCurrent - windowLow) ) /* equivalent to= `curr > repIndex >=3D windowLow` */ if (MEM_read32(ip) =3D=3D MEM_read32(repMatch)) { /* repcode detected we should take it */ const BYTE* const repEnd =3D repIndex < dictLimit ? dictEn= d : iend; matchLength =3D ZSTD_count_2segments(ip+4, repMatch+4, ien= d, repEnd, prefixStart) + 4; - offcode =3D offset_2; offset_2 =3D offset_1; offset_1 =3D = (U32)offcode; /* swap offset history */ - ZSTD_storeSeq(seqStore, 0, anchor, iend, STORE_REPCODE_1, = matchLength); + offBase =3D offset_2; offset_2 =3D offset_1; offset_1 =3D = (U32)offBase; /* swap offset history */ + ZSTD_storeSeq(seqStore, 0, anchor, iend, REPCODE1_TO_OFFBA= SE, matchLength); ip +=3D matchLength; anchor =3D ip; continue; /* faster when present ... (?) */ @@ -2045,58 +2136,65 @@ size_t ZSTD_compressBlock_lazy_extDict_generic( /* Return the last literals size */ return (size_t)(iend - anchor); } +#endif /* build exclusions */ =20 - +#ifndef ZSTD_EXCLUDE_GREEDY_BLOCK_COMPRESSOR size_t ZSTD_compressBlock_greedy_extDict( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize) { return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src,= srcSize, search_hashChain, 0); } =20 -size_t ZSTD_compressBlock_lazy_extDict( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_greedy_extDict_row( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize) - { - return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src,= srcSize, search_hashChain, 1); + return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src,= srcSize, search_rowHash, 0); } +#endif =20 -size_t ZSTD_compressBlock_lazy2_extDict( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +#ifndef ZSTD_EXCLUDE_LAZY_BLOCK_COMPRESSOR +size_t ZSTD_compressBlock_lazy_extDict( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize) =20 { - return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src,= srcSize, search_hashChain, 2); + return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src,= srcSize, search_hashChain, 1); } =20 -size_t ZSTD_compressBlock_btlazy2_extDict( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_lazy_extDict_row( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize) =20 { - return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src,= srcSize, search_binaryTree, 2); + return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src,= srcSize, search_rowHash, 1); } +#endif =20 -size_t ZSTD_compressBlock_greedy_extDict_row( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +#ifndef ZSTD_EXCLUDE_LAZY2_BLOCK_COMPRESSOR +size_t ZSTD_compressBlock_lazy2_extDict( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize) + { - return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src,= srcSize, search_rowHash, 0); + return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src,= srcSize, search_hashChain, 2); } =20 -size_t ZSTD_compressBlock_lazy_extDict_row( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_lazy2_extDict_row( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize) - { - return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src,= srcSize, search_rowHash, 1); + return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src,= srcSize, search_rowHash, 2); } +#endif =20 -size_t ZSTD_compressBlock_lazy2_extDict_row( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +#ifndef ZSTD_EXCLUDE_BTLAZY2_BLOCK_COMPRESSOR +size_t ZSTD_compressBlock_btlazy2_extDict( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize) =20 { - return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src,= srcSize, search_rowHash, 2); + return ZSTD_compressBlock_lazy_extDict_generic(ms, seqStore, rep, src,= srcSize, search_binaryTree, 2); } +#endif diff --git a/lib/zstd/compress/zstd_lazy.h b/lib/zstd/compress/zstd_lazy.h index e5bdf4df8dde..987a036d8bde 100644 --- a/lib/zstd/compress/zstd_lazy.h +++ b/lib/zstd/compress/zstd_lazy.h @@ -1,5 +1,6 @@ +/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */ /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -11,7 +12,6 @@ #ifndef ZSTD_LAZY_H #define ZSTD_LAZY_H =20 - #include "zstd_compress_internal.h" =20 /* @@ -22,98 +22,173 @@ */ #define ZSTD_LAZY_DDSS_BUCKET_LOG 2 =20 -U32 ZSTD_insertAndFindFirstIndex(ZSTD_matchState_t* ms, const BYTE* ip); -void ZSTD_row_update(ZSTD_matchState_t* const ms, const BYTE* ip); +#define ZSTD_ROW_HASH_TAG_BITS 8 /* nb bits to use for the tag */ + +#if !defined(ZSTD_EXCLUDE_GREEDY_BLOCK_COMPRESSOR) \ + || !defined(ZSTD_EXCLUDE_LAZY_BLOCK_COMPRESSOR) \ + || !defined(ZSTD_EXCLUDE_LAZY2_BLOCK_COMPRESSOR) \ + || !defined(ZSTD_EXCLUDE_BTLAZY2_BLOCK_COMPRESSOR) +U32 ZSTD_insertAndFindFirstIndex(ZSTD_MatchState_t* ms, const BYTE* ip); +void ZSTD_row_update(ZSTD_MatchState_t* const ms, const BYTE* ip); =20 -void ZSTD_dedicatedDictSearch_lazy_loadDictionary(ZSTD_matchState_t* ms, c= onst BYTE* const ip); +void ZSTD_dedicatedDictSearch_lazy_loadDictionary(ZSTD_MatchState_t* ms, c= onst BYTE* const ip); =20 void ZSTD_preserveUnsortedMark (U32* const table, U32 const size, U32 cons= t reducerValue); /*! used in ZSTD_reduceIndex(). preemptively increase val= ue of ZSTD_DUBT_UNSORTED_MARK */ +#endif =20 -size_t ZSTD_compressBlock_btlazy2( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +#ifndef ZSTD_EXCLUDE_GREEDY_BLOCK_COMPRESSOR +size_t ZSTD_compressBlock_greedy( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); -size_t ZSTD_compressBlock_lazy2( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_greedy_row( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); -size_t ZSTD_compressBlock_lazy( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_greedy_dictMatchState( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); -size_t ZSTD_compressBlock_greedy( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_greedy_dictMatchState_row( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); -size_t ZSTD_compressBlock_lazy2_row( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_greedy_dedicatedDictSearch( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); -size_t ZSTD_compressBlock_lazy_row( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_greedy_dedicatedDictSearch_row( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); -size_t ZSTD_compressBlock_greedy_row( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_greedy_extDict( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + void const* src, size_t srcSize); +size_t ZSTD_compressBlock_greedy_extDict_row( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); =20 -size_t ZSTD_compressBlock_btlazy2_dictMatchState( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +#define ZSTD_COMPRESSBLOCK_GREEDY ZSTD_compressBlock_greedy +#define ZSTD_COMPRESSBLOCK_GREEDY_ROW ZSTD_compressBlock_greedy_row +#define ZSTD_COMPRESSBLOCK_GREEDY_DICTMATCHSTATE ZSTD_compressBlock_greedy= _dictMatchState +#define ZSTD_COMPRESSBLOCK_GREEDY_DICTMATCHSTATE_ROW ZSTD_compressBlock_gr= eedy_dictMatchState_row +#define ZSTD_COMPRESSBLOCK_GREEDY_DEDICATEDDICTSEARCH ZSTD_compressBlock_g= reedy_dedicatedDictSearch +#define ZSTD_COMPRESSBLOCK_GREEDY_DEDICATEDDICTSEARCH_ROW ZSTD_compressBlo= ck_greedy_dedicatedDictSearch_row +#define ZSTD_COMPRESSBLOCK_GREEDY_EXTDICT ZSTD_compressBlock_greedy_extDict +#define ZSTD_COMPRESSBLOCK_GREEDY_EXTDICT_ROW ZSTD_compressBlock_greedy_ex= tDict_row +#else +#define ZSTD_COMPRESSBLOCK_GREEDY NULL +#define ZSTD_COMPRESSBLOCK_GREEDY_ROW NULL +#define ZSTD_COMPRESSBLOCK_GREEDY_DICTMATCHSTATE NULL +#define ZSTD_COMPRESSBLOCK_GREEDY_DICTMATCHSTATE_ROW NULL +#define ZSTD_COMPRESSBLOCK_GREEDY_DEDICATEDDICTSEARCH NULL +#define ZSTD_COMPRESSBLOCK_GREEDY_DEDICATEDDICTSEARCH_ROW NULL +#define ZSTD_COMPRESSBLOCK_GREEDY_EXTDICT NULL +#define ZSTD_COMPRESSBLOCK_GREEDY_EXTDICT_ROW NULL +#endif + +#ifndef ZSTD_EXCLUDE_LAZY_BLOCK_COMPRESSOR +size_t ZSTD_compressBlock_lazy( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); -size_t ZSTD_compressBlock_lazy2_dictMatchState( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_lazy_row( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); size_t ZSTD_compressBlock_lazy_dictMatchState( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], - void const* src, size_t srcSize); -size_t ZSTD_compressBlock_greedy_dictMatchState( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], - void const* src, size_t srcSize); -size_t ZSTD_compressBlock_lazy2_dictMatchState_row( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); size_t ZSTD_compressBlock_lazy_dictMatchState_row( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); -size_t ZSTD_compressBlock_greedy_dictMatchState_row( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_lazy_dedicatedDictSearch( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); - -size_t ZSTD_compressBlock_lazy2_dedicatedDictSearch( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_lazy_dedicatedDictSearch_row( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); -size_t ZSTD_compressBlock_lazy_dedicatedDictSearch( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_lazy_extDict( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); -size_t ZSTD_compressBlock_greedy_dedicatedDictSearch( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_lazy_extDict_row( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); -size_t ZSTD_compressBlock_lazy2_dedicatedDictSearch_row( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + +#define ZSTD_COMPRESSBLOCK_LAZY ZSTD_compressBlock_lazy +#define ZSTD_COMPRESSBLOCK_LAZY_ROW ZSTD_compressBlock_lazy_row +#define ZSTD_COMPRESSBLOCK_LAZY_DICTMATCHSTATE ZSTD_compressBlock_lazy_dic= tMatchState +#define ZSTD_COMPRESSBLOCK_LAZY_DICTMATCHSTATE_ROW ZSTD_compressBlock_lazy= _dictMatchState_row +#define ZSTD_COMPRESSBLOCK_LAZY_DEDICATEDDICTSEARCH ZSTD_compressBlock_laz= y_dedicatedDictSearch +#define ZSTD_COMPRESSBLOCK_LAZY_DEDICATEDDICTSEARCH_ROW ZSTD_compressBlock= _lazy_dedicatedDictSearch_row +#define ZSTD_COMPRESSBLOCK_LAZY_EXTDICT ZSTD_compressBlock_lazy_extDict +#define ZSTD_COMPRESSBLOCK_LAZY_EXTDICT_ROW ZSTD_compressBlock_lazy_extDic= t_row +#else +#define ZSTD_COMPRESSBLOCK_LAZY NULL +#define ZSTD_COMPRESSBLOCK_LAZY_ROW NULL +#define ZSTD_COMPRESSBLOCK_LAZY_DICTMATCHSTATE NULL +#define ZSTD_COMPRESSBLOCK_LAZY_DICTMATCHSTATE_ROW NULL +#define ZSTD_COMPRESSBLOCK_LAZY_DEDICATEDDICTSEARCH NULL +#define ZSTD_COMPRESSBLOCK_LAZY_DEDICATEDDICTSEARCH_ROW NULL +#define ZSTD_COMPRESSBLOCK_LAZY_EXTDICT NULL +#define ZSTD_COMPRESSBLOCK_LAZY_EXTDICT_ROW NULL +#endif + +#ifndef ZSTD_EXCLUDE_LAZY2_BLOCK_COMPRESSOR +size_t ZSTD_compressBlock_lazy2( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); -size_t ZSTD_compressBlock_lazy_dedicatedDictSearch_row( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_lazy2_row( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); -size_t ZSTD_compressBlock_greedy_dedicatedDictSearch_row( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_lazy2_dictMatchState( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); - -size_t ZSTD_compressBlock_greedy_extDict( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_lazy2_dictMatchState_row( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); -size_t ZSTD_compressBlock_lazy_extDict( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_lazy2_dedicatedDictSearch( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + void const* src, size_t srcSize); +size_t ZSTD_compressBlock_lazy2_dedicatedDictSearch_row( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); size_t ZSTD_compressBlock_lazy2_extDict( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); -size_t ZSTD_compressBlock_greedy_extDict_row( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_lazy2_extDict_row( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); -size_t ZSTD_compressBlock_lazy_extDict_row( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + +#define ZSTD_COMPRESSBLOCK_LAZY2 ZSTD_compressBlock_lazy2 +#define ZSTD_COMPRESSBLOCK_LAZY2_ROW ZSTD_compressBlock_lazy2_row +#define ZSTD_COMPRESSBLOCK_LAZY2_DICTMATCHSTATE ZSTD_compressBlock_lazy2_d= ictMatchState +#define ZSTD_COMPRESSBLOCK_LAZY2_DICTMATCHSTATE_ROW ZSTD_compressBlock_laz= y2_dictMatchState_row +#define ZSTD_COMPRESSBLOCK_LAZY2_DEDICATEDDICTSEARCH ZSTD_compressBlock_la= zy2_dedicatedDictSearch +#define ZSTD_COMPRESSBLOCK_LAZY2_DEDICATEDDICTSEARCH_ROW ZSTD_compressBloc= k_lazy2_dedicatedDictSearch_row +#define ZSTD_COMPRESSBLOCK_LAZY2_EXTDICT ZSTD_compressBlock_lazy2_extDict +#define ZSTD_COMPRESSBLOCK_LAZY2_EXTDICT_ROW ZSTD_compressBlock_lazy2_extD= ict_row +#else +#define ZSTD_COMPRESSBLOCK_LAZY2 NULL +#define ZSTD_COMPRESSBLOCK_LAZY2_ROW NULL +#define ZSTD_COMPRESSBLOCK_LAZY2_DICTMATCHSTATE NULL +#define ZSTD_COMPRESSBLOCK_LAZY2_DICTMATCHSTATE_ROW NULL +#define ZSTD_COMPRESSBLOCK_LAZY2_DEDICATEDDICTSEARCH NULL +#define ZSTD_COMPRESSBLOCK_LAZY2_DEDICATEDDICTSEARCH_ROW NULL +#define ZSTD_COMPRESSBLOCK_LAZY2_EXTDICT NULL +#define ZSTD_COMPRESSBLOCK_LAZY2_EXTDICT_ROW NULL +#endif + +#ifndef ZSTD_EXCLUDE_BTLAZY2_BLOCK_COMPRESSOR +size_t ZSTD_compressBlock_btlazy2( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); -size_t ZSTD_compressBlock_lazy2_extDict_row( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_btlazy2_dictMatchState( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); size_t ZSTD_compressBlock_btlazy2_extDict( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); - =20 =20 +#define ZSTD_COMPRESSBLOCK_BTLAZY2 ZSTD_compressBlock_btlazy2 +#define ZSTD_COMPRESSBLOCK_BTLAZY2_DICTMATCHSTATE ZSTD_compressBlock_btlaz= y2_dictMatchState +#define ZSTD_COMPRESSBLOCK_BTLAZY2_EXTDICT ZSTD_compressBlock_btlazy2_extD= ict +#else +#define ZSTD_COMPRESSBLOCK_BTLAZY2 NULL +#define ZSTD_COMPRESSBLOCK_BTLAZY2_DICTMATCHSTATE NULL +#define ZSTD_COMPRESSBLOCK_BTLAZY2_EXTDICT NULL +#endif =20 #endif /* ZSTD_LAZY_H */ diff --git a/lib/zstd/compress/zstd_ldm.c b/lib/zstd/compress/zstd_ldm.c index dd86fc83e7dd..54eefad9cae6 100644 --- a/lib/zstd/compress/zstd_ldm.c +++ b/lib/zstd/compress/zstd_ldm.c @@ -1,5 +1,6 @@ +// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -16,7 +17,7 @@ #include "zstd_double_fast.h" /* ZSTD_fillDoubleHashTable() */ #include "zstd_ldm_geartab.h" =20 -#define LDM_BUCKET_SIZE_LOG 3 +#define LDM_BUCKET_SIZE_LOG 4 #define LDM_MIN_MATCH_LENGTH 64 #define LDM_HASH_RLOG 7 =20 @@ -133,21 +134,35 @@ static size_t ZSTD_ldm_gear_feed(ldmRollingHashState_= t* state, } =20 void ZSTD_ldm_adjustParameters(ldmParams_t* params, - ZSTD_compressionParameters const* cParams) + const ZSTD_compressionParameters* cParams) { params->windowLog =3D cParams->windowLog; ZSTD_STATIC_ASSERT(LDM_BUCKET_SIZE_LOG <=3D ZSTD_LDM_BUCKETSIZELOG_MAX= ); DEBUGLOG(4, "ZSTD_ldm_adjustParameters"); - if (!params->bucketSizeLog) params->bucketSizeLog =3D LDM_BUCKET_SIZE_= LOG; - if (!params->minMatchLength) params->minMatchLength =3D LDM_MIN_MATCH_= LENGTH; + if (params->hashRateLog =3D=3D 0) { + if (params->hashLog > 0) { + /* if params->hashLog is set, derive hashRateLog from it */ + assert(params->hashLog <=3D ZSTD_HASHLOG_MAX); + if (params->windowLog > params->hashLog) { + params->hashRateLog =3D params->windowLog - params->hashLo= g; + } + } else { + assert(1 <=3D (int)cParams->strategy && (int)cParams->strategy= <=3D 9); + /* mapping from [fast, rate7] to [btultra2, rate4] */ + params->hashRateLog =3D 7 - (cParams->strategy/3); + } + } if (params->hashLog =3D=3D 0) { - params->hashLog =3D MAX(ZSTD_HASHLOG_MIN, params->windowLog - LDM_= HASH_RLOG); - assert(params->hashLog <=3D ZSTD_HASHLOG_MAX); + params->hashLog =3D BOUNDED(ZSTD_HASHLOG_MIN, params->windowLog - = params->hashRateLog, ZSTD_HASHLOG_MAX); } - if (params->hashRateLog =3D=3D 0) { - params->hashRateLog =3D params->windowLog < params->hashLog - ? 0 - : params->windowLog - params->hashLog; + if (params->minMatchLength =3D=3D 0) { + params->minMatchLength =3D LDM_MIN_MATCH_LENGTH; + if (cParams->strategy >=3D ZSTD_btultra) + params->minMatchLength /=3D 2; + } + if (params->bucketSizeLog=3D=3D0) { + assert(1 <=3D (int)cParams->strategy && (int)cParams->strategy <= =3D 9); + params->bucketSizeLog =3D BOUNDED(LDM_BUCKET_SIZE_LOG, (U32)cParam= s->strategy, ZSTD_LDM_BUCKETSIZELOG_MAX); } params->bucketSizeLog =3D MIN(params->bucketSizeLog, params->hashLog); } @@ -170,22 +185,22 @@ size_t ZSTD_ldm_getMaxNbSeq(ldmParams_t params, size_= t maxChunkSize) /* ZSTD_ldm_getBucket() : * Returns a pointer to the start of the bucket associated with hash. */ static ldmEntry_t* ZSTD_ldm_getBucket( - ldmState_t* ldmState, size_t hash, ldmParams_t const ldmParams) + const ldmState_t* ldmState, size_t hash, U32 const bucketSizeLog) { - return ldmState->hashTable + (hash << ldmParams.bucketSizeLog); + return ldmState->hashTable + (hash << bucketSizeLog); } =20 /* ZSTD_ldm_insertEntry() : * Insert the entry with corresponding hash into the hash table */ static void ZSTD_ldm_insertEntry(ldmState_t* ldmState, size_t const hash, const ldmEntry_t entry, - ldmParams_t const ldmParams) + U32 const bucketSizeLog) { BYTE* const pOffset =3D ldmState->bucketOffsets + hash; unsigned const offset =3D *pOffset; =20 - *(ZSTD_ldm_getBucket(ldmState, hash, ldmParams) + offset) =3D entry; - *pOffset =3D (BYTE)((offset + 1) & ((1u << ldmParams.bucketSizeLog) - = 1)); + *(ZSTD_ldm_getBucket(ldmState, hash, bucketSizeLog) + offset) =3D entr= y; + *pOffset =3D (BYTE)((offset + 1) & ((1u << bucketSizeLog) - 1)); =20 } =20 @@ -234,7 +249,7 @@ static size_t ZSTD_ldm_countBackwardsMatch_2segments( * * The tables for the other strategies are filled within their * block compressors. */ -static size_t ZSTD_ldm_fillFastTables(ZSTD_matchState_t* ms, +static size_t ZSTD_ldm_fillFastTables(ZSTD_MatchState_t* ms, void const* end) { const BYTE* const iend =3D (const BYTE*)end; @@ -242,11 +257,15 @@ static size_t ZSTD_ldm_fillFastTables(ZSTD_matchState= _t* ms, switch(ms->cParams.strategy) { case ZSTD_fast: - ZSTD_fillHashTable(ms, iend, ZSTD_dtlm_fast); + ZSTD_fillHashTable(ms, iend, ZSTD_dtlm_fast, ZSTD_tfp_forCCtx); break; =20 case ZSTD_dfast: - ZSTD_fillDoubleHashTable(ms, iend, ZSTD_dtlm_fast); +#ifndef ZSTD_EXCLUDE_DFAST_BLOCK_COMPRESSOR + ZSTD_fillDoubleHashTable(ms, iend, ZSTD_dtlm_fast, ZSTD_tfp_forCCt= x); +#else + assert(0); /* shouldn't be called: cparams should've been adjusted= . */ +#endif break; =20 case ZSTD_greedy: @@ -269,7 +288,8 @@ void ZSTD_ldm_fillHashTable( const BYTE* iend, ldmParams_t const* params) { U32 const minMatchLength =3D params->minMatchLength; - U32 const hBits =3D params->hashLog - params->bucketSizeLog; + U32 const bucketSizeLog =3D params->bucketSizeLog; + U32 const hBits =3D params->hashLog - bucketSizeLog; BYTE const* const base =3D ldmState->window.base; BYTE const* const istart =3D ip; ldmRollingHashState_t hashState; @@ -284,7 +304,7 @@ void ZSTD_ldm_fillHashTable( unsigned n; =20 numSplits =3D 0; - hashed =3D ZSTD_ldm_gear_feed(&hashState, ip, iend - ip, splits, &= numSplits); + hashed =3D ZSTD_ldm_gear_feed(&hashState, ip, (size_t)(iend - ip),= splits, &numSplits); =20 for (n =3D 0; n < numSplits; n++) { if (ip + splits[n] >=3D istart + minMatchLength) { @@ -295,7 +315,7 @@ void ZSTD_ldm_fillHashTable( =20 entry.offset =3D (U32)(split - base); entry.checksum =3D (U32)(xxhash >> 32); - ZSTD_ldm_insertEntry(ldmState, hash, entry, *params); + ZSTD_ldm_insertEntry(ldmState, hash, entry, params->bucket= SizeLog); } } =20 @@ -309,7 +329,7 @@ void ZSTD_ldm_fillHashTable( * Sets cctx->nextToUpdate to a position corresponding closer to anchor * if it is far way * (after a long match, only update tables a limited amount). */ -static void ZSTD_ldm_limitTableUpdate(ZSTD_matchState_t* ms, const BYTE* a= nchor) +static void ZSTD_ldm_limitTableUpdate(ZSTD_MatchState_t* ms, const BYTE* a= nchor) { U32 const curr =3D (U32)(anchor - ms->window.base); if (curr > ms->nextToUpdate + 1024) { @@ -318,8 +338,10 @@ static void ZSTD_ldm_limitTableUpdate(ZSTD_matchState_= t* ms, const BYTE* anchor) } } =20 -static size_t ZSTD_ldm_generateSequences_internal( - ldmState_t* ldmState, rawSeqStore_t* rawSeqStore, +static +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR +size_t ZSTD_ldm_generateSequences_internal( + ldmState_t* ldmState, RawSeqStore_t* rawSeqStore, ldmParams_t const* params, void const* src, size_t srcSize) { /* LDM parameters */ @@ -373,7 +395,7 @@ static size_t ZSTD_ldm_generateSequences_internal( candidates[n].split =3D split; candidates[n].hash =3D hash; candidates[n].checksum =3D (U32)(xxhash >> 32); - candidates[n].bucket =3D ZSTD_ldm_getBucket(ldmState, hash, *p= arams); + candidates[n].bucket =3D ZSTD_ldm_getBucket(ldmState, hash, pa= rams->bucketSizeLog); PREFETCH_L1(candidates[n].bucket); } =20 @@ -396,7 +418,7 @@ static size_t ZSTD_ldm_generateSequences_internal( * the previous one, we merely register it in the hash table a= nd * move on */ if (split < anchor) { - ZSTD_ldm_insertEntry(ldmState, hash, newEntry, *params); + ZSTD_ldm_insertEntry(ldmState, hash, newEntry, params->buc= ketSizeLog); continue; } =20 @@ -443,7 +465,7 @@ static size_t ZSTD_ldm_generateSequences_internal( /* No match found -- insert an entry into the hash table * and process the next candidate match */ if (bestEntry =3D=3D NULL) { - ZSTD_ldm_insertEntry(ldmState, hash, newEntry, *params); + ZSTD_ldm_insertEntry(ldmState, hash, newEntry, params->buc= ketSizeLog); continue; } =20 @@ -464,7 +486,7 @@ static size_t ZSTD_ldm_generateSequences_internal( =20 /* Insert the current entry into the hash table --- it must be * done after the previous block to avoid clobbering bestEntry= */ - ZSTD_ldm_insertEntry(ldmState, hash, newEntry, *params); + ZSTD_ldm_insertEntry(ldmState, hash, newEntry, params->bucketS= izeLog); =20 anchor =3D split + forwardMatchLength; =20 @@ -503,7 +525,7 @@ static void ZSTD_ldm_reduceTable(ldmEntry_t* const tabl= e, U32 const size, } =20 size_t ZSTD_ldm_generateSequences( - ldmState_t* ldmState, rawSeqStore_t* sequences, + ldmState_t* ldmState, RawSeqStore_t* sequences, ldmParams_t const* params, void const* src, size_t srcSize) { U32 const maxDist =3D 1U << params->windowLog; @@ -549,7 +571,7 @@ size_t ZSTD_ldm_generateSequences( * the window through early invalidation. * TODO: * Test the chunk size. * * Try invalidation after the sequence generation and test= the - * the offset against maxDist directly. + * offset against maxDist directly. * * NOTE: Because of dictionaries + sequence splitting we MUST make= sure * that any offset used is valid at the END of the sequence, since= it may @@ -580,7 +602,7 @@ size_t ZSTD_ldm_generateSequences( } =20 void -ZSTD_ldm_skipSequences(rawSeqStore_t* rawSeqStore, size_t srcSize, U32 con= st minMatch) +ZSTD_ldm_skipSequences(RawSeqStore_t* rawSeqStore, size_t srcSize, U32 con= st minMatch) { while (srcSize > 0 && rawSeqStore->pos < rawSeqStore->size) { rawSeq* seq =3D rawSeqStore->seq + rawSeqStore->pos; @@ -616,7 +638,7 @@ ZSTD_ldm_skipSequences(rawSeqStore_t* rawSeqStore, size= _t srcSize, U32 const min * Returns the current sequence to handle, or if the rest of the block sho= uld * be literals, it returns a sequence with offset =3D=3D 0. */ -static rawSeq maybeSplitSequence(rawSeqStore_t* rawSeqStore, +static rawSeq maybeSplitSequence(RawSeqStore_t* rawSeqStore, U32 const remaining, U32 const minMatch) { rawSeq sequence =3D rawSeqStore->seq[rawSeqStore->pos]; @@ -640,7 +662,7 @@ static rawSeq maybeSplitSequence(rawSeqStore_t* rawSeqS= tore, return sequence; } =20 -void ZSTD_ldm_skipRawSeqStoreBytes(rawSeqStore_t* rawSeqStore, size_t nbBy= tes) { +void ZSTD_ldm_skipRawSeqStoreBytes(RawSeqStore_t* rawSeqStore, size_t nbBy= tes) { U32 currPos =3D (U32)(rawSeqStore->posInSequence + nbBytes); while (currPos && rawSeqStore->pos < rawSeqStore->size) { rawSeq currSeq =3D rawSeqStore->seq[rawSeqStore->pos]; @@ -657,14 +679,14 @@ void ZSTD_ldm_skipRawSeqStoreBytes(rawSeqStore_t* raw= SeqStore, size_t nbBytes) { } } =20 -size_t ZSTD_ldm_blockCompress(rawSeqStore_t* rawSeqStore, - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], - ZSTD_paramSwitch_e useRowMatchFinder, +size_t ZSTD_ldm_blockCompress(RawSeqStore_t* rawSeqStore, + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + ZSTD_ParamSwitch_e useRowMatchFinder, void const* src, size_t srcSize) { const ZSTD_compressionParameters* const cParams =3D &ms->cParams; unsigned const minMatch =3D cParams->minMatch; - ZSTD_blockCompressor const blockCompressor =3D + ZSTD_BlockCompressor_f const blockCompressor =3D ZSTD_selectBlockCompressor(cParams->strategy, useRowMatchFinder, Z= STD_matchState_dictMode(ms)); /* Input bounds */ BYTE const* const istart =3D (BYTE const*)src; @@ -689,7 +711,6 @@ size_t ZSTD_ldm_blockCompress(rawSeqStore_t* rawSeqStor= e, /* maybeSplitSequence updates rawSeqStore->pos */ rawSeq const sequence =3D maybeSplitSequence(rawSeqStore, (U32)(iend - ip), minMa= tch); - int i; /* End signal */ if (sequence.offset =3D=3D 0) break; @@ -702,6 +723,7 @@ size_t ZSTD_ldm_blockCompress(rawSeqStore_t* rawSeqStor= e, /* Run the block compressor */ DEBUGLOG(5, "pos %u : calling block compressor on segment of size = %u", (unsigned)(ip-istart), sequence.litLength); { + int i; size_t const newLitLength =3D blockCompressor(ms, seqStore, rep, ip, sequence.litLength); ip +=3D sequence.litLength; @@ -711,7 +733,7 @@ size_t ZSTD_ldm_blockCompress(rawSeqStore_t* rawSeqStor= e, rep[0] =3D sequence.offset; /* Store the sequence */ ZSTD_storeSeq(seqStore, newLitLength, ip - newLitLength, iend, - STORE_OFFSET(sequence.offset), + OFFSET_TO_OFFBASE(sequence.offset), sequence.matchLength); ip +=3D sequence.matchLength; } diff --git a/lib/zstd/compress/zstd_ldm.h b/lib/zstd/compress/zstd_ldm.h index fbc6a5e88fd7..41400a7191b2 100644 --- a/lib/zstd/compress/zstd_ldm.h +++ b/lib/zstd/compress/zstd_ldm.h @@ -1,5 +1,6 @@ +/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */ /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -11,7 +12,6 @@ #ifndef ZSTD_LDM_H #define ZSTD_LDM_H =20 - #include "zstd_compress_internal.h" /* ldmParams_t, U32 */ #include /* ZSTD_CCtx, size_t */ =20 @@ -40,7 +40,7 @@ void ZSTD_ldm_fillHashTable( * sequences. */ size_t ZSTD_ldm_generateSequences( - ldmState_t* ldms, rawSeqStore_t* sequences, + ldmState_t* ldms, RawSeqStore_t* sequences, ldmParams_t const* params, void const* src, size_t srcSize); =20 /* @@ -61,9 +61,9 @@ size_t ZSTD_ldm_generateSequences( * two. We handle that case correctly, and update `rawSeqStore` appropriat= ely. * NOTE: This function does not return any errors. */ -size_t ZSTD_ldm_blockCompress(rawSeqStore_t* rawSeqStore, - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_= NUM], - ZSTD_paramSwitch_e useRowMatchFinder, +size_t ZSTD_ldm_blockCompress(RawSeqStore_t* rawSeqStore, + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_= NUM], + ZSTD_ParamSwitch_e useRowMatchFinder, void const* src, size_t srcSize); =20 /* @@ -73,7 +73,7 @@ size_t ZSTD_ldm_blockCompress(rawSeqStore_t* rawSeqStore, * Avoids emitting matches less than `minMatch` bytes. * Must be called for data that is not passed to ZSTD_ldm_blockCompress(). */ -void ZSTD_ldm_skipSequences(rawSeqStore_t* rawSeqStore, size_t srcSize, +void ZSTD_ldm_skipSequences(RawSeqStore_t* rawSeqStore, size_t srcSize, U32 const minMatch); =20 /* ZSTD_ldm_skipRawSeqStoreBytes(): @@ -81,7 +81,7 @@ void ZSTD_ldm_skipSequences(rawSeqStore_t* rawSeqStore, s= ize_t srcSize, * Not to be used in conjunction with ZSTD_ldm_skipSequences(). * Must be called for data with is not passed to ZSTD_ldm_blockCompress(). */ -void ZSTD_ldm_skipRawSeqStoreBytes(rawSeqStore_t* rawSeqStore, size_t nbBy= tes); +void ZSTD_ldm_skipRawSeqStoreBytes(RawSeqStore_t* rawSeqStore, size_t nbBy= tes); =20 /* ZSTD_ldm_getTableSize() : * Estimate the space needed for long distance matching tables or 0 if LD= M is @@ -107,5 +107,4 @@ size_t ZSTD_ldm_getMaxNbSeq(ldmParams_t params, size_t = maxChunkSize); void ZSTD_ldm_adjustParameters(ldmParams_t* params, ZSTD_compressionParameters const* cParams); =20 - #endif /* ZSTD_FAST_H */ diff --git a/lib/zstd/compress/zstd_ldm_geartab.h b/lib/zstd/compress/zstd_= ldm_geartab.h index 647f865be290..cfccfc46f6f7 100644 --- a/lib/zstd/compress/zstd_ldm_geartab.h +++ b/lib/zstd/compress/zstd_ldm_geartab.h @@ -1,5 +1,6 @@ +/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */ /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the diff --git a/lib/zstd/compress/zstd_opt.c b/lib/zstd/compress/zstd_opt.c index fd82acfda62f..b62fd1b0d83e 100644 --- a/lib/zstd/compress/zstd_opt.c +++ b/lib/zstd/compress/zstd_opt.c @@ -1,5 +1,6 @@ +// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause /* - * Copyright (c) Przemyslaw Skibinski, Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -12,11 +13,14 @@ #include "hist.h" #include "zstd_opt.h" =20 +#if !defined(ZSTD_EXCLUDE_BTLAZY2_BLOCK_COMPRESSOR) \ + || !defined(ZSTD_EXCLUDE_BTOPT_BLOCK_COMPRESSOR) \ + || !defined(ZSTD_EXCLUDE_BTULTRA_BLOCK_COMPRESSOR) =20 #define ZSTD_LITFREQ_ADD 2 /* scaling factor for litFreq, so that fre= quencies adapt faster to new stats */ #define ZSTD_MAX_PRICE (1<<30) =20 -#define ZSTD_PREDEF_THRESHOLD 1024 /* if srcSize < ZSTD_PREDEF_THRESHOLD= , symbols' cost is assumed static, directly determined by pre-defined distr= ibutions */ +#define ZSTD_PREDEF_THRESHOLD 8 /* if srcSize < ZSTD_PREDEF_THRESHOLD, s= ymbols' cost is assumed static, directly determined by pre-defined distribu= tions */ =20 =20 /*-************************************* @@ -26,27 +30,35 @@ #if 0 /* approximation at bit level (for tests) */ # define BITCOST_ACCURACY 0 # define BITCOST_MULTIPLIER (1 << BITCOST_ACCURACY) -# define WEIGHT(stat, opt) ((void)opt, ZSTD_bitWeight(stat)) +# define WEIGHT(stat, opt) ((void)(opt), ZSTD_bitWeight(stat)) #elif 0 /* fractional bit accuracy (for tests) */ # define BITCOST_ACCURACY 8 # define BITCOST_MULTIPLIER (1 << BITCOST_ACCURACY) -# define WEIGHT(stat,opt) ((void)opt, ZSTD_fracWeight(stat)) +# define WEIGHT(stat,opt) ((void)(opt), ZSTD_fracWeight(stat)) #else /* opt=3D=3Dapprox, ultra=3D=3Daccurate */ # define BITCOST_ACCURACY 8 # define BITCOST_MULTIPLIER (1 << BITCOST_ACCURACY) -# define WEIGHT(stat,opt) (opt ? ZSTD_fracWeight(stat) : ZSTD_bitWeight(s= tat)) +# define WEIGHT(stat,opt) ((opt) ? ZSTD_fracWeight(stat) : ZSTD_bitWeight= (stat)) #endif =20 +/* ZSTD_bitWeight() : + * provide estimated "cost" of a stat in full bits only */ MEM_STATIC U32 ZSTD_bitWeight(U32 stat) { return (ZSTD_highbit32(stat+1) * BITCOST_MULTIPLIER); } =20 +/* ZSTD_fracWeight() : + * provide fractional-bit "cost" of a stat, + * using linear interpolation approximation */ MEM_STATIC U32 ZSTD_fracWeight(U32 rawStat) { U32 const stat =3D rawStat + 1; U32 const hb =3D ZSTD_highbit32(stat); U32 const BWeight =3D hb * BITCOST_MULTIPLIER; + /* Fweight was meant for "Fractional weight" + * but it's effectively a value between 1 and 2 + * using fixed point arithmetic */ U32 const FWeight =3D (stat << BITCOST_ACCURACY) >> hb; U32 const weight =3D BWeight + FWeight; assert(hb + BITCOST_ACCURACY < 31); @@ -57,7 +69,7 @@ MEM_STATIC U32 ZSTD_fracWeight(U32 rawStat) /* debugging function, * @return price in bytes as fractional value * for debug messages only */ -MEM_STATIC double ZSTD_fCost(U32 price) +MEM_STATIC double ZSTD_fCost(int price) { return (double)price / (BITCOST_MULTIPLIER*8); } @@ -88,20 +100,26 @@ static U32 sum_u32(const unsigned table[], size_t nbEl= ts) return total; } =20 -static U32 ZSTD_downscaleStats(unsigned* table, U32 lastEltIndex, U32 shif= t) +typedef enum { base_0possible=3D0, base_1guaranteed=3D1 } base_directive_e; + +static U32 +ZSTD_downscaleStats(unsigned* table, U32 lastEltIndex, U32 shift, base_dir= ective_e base1) { U32 s, sum=3D0; - DEBUGLOG(5, "ZSTD_downscaleStats (nbElts=3D%u, shift=3D%u)", (unsigned= )lastEltIndex+1, (unsigned)shift); + DEBUGLOG(5, "ZSTD_downscaleStats (nbElts=3D%u, shift=3D%u)", + (unsigned)lastEltIndex+1, (unsigned)shift ); assert(shift < 30); for (s=3D0; s> shift); - sum +=3D table[s]; + unsigned const base =3D base1 ? 1 : (table[s]>0); + unsigned const newStat =3D base + (table[s] >> shift); + sum +=3D newStat; + table[s] =3D newStat; } return sum; } =20 /* ZSTD_scaleStats() : - * reduce all elements in table is sum too large + * reduce all elt frequencies in table if sum too large * return the resulting sum of elements */ static U32 ZSTD_scaleStats(unsigned* table, U32 lastEltIndex, U32 logTarge= t) { @@ -110,7 +128,7 @@ static U32 ZSTD_scaleStats(unsigned* table, U32 lastElt= Index, U32 logTarget) DEBUGLOG(5, "ZSTD_scaleStats (nbElts=3D%u, target=3D%u)", (unsigned)la= stEltIndex+1, (unsigned)logTarget); assert(logTarget < 30); if (factor <=3D 1) return prevsum; - return ZSTD_downscaleStats(table, lastEltIndex, ZSTD_highbit32(factor)= ); + return ZSTD_downscaleStats(table, lastEltIndex, ZSTD_highbit32(factor)= , base_1guaranteed); } =20 /* ZSTD_rescaleFreqs() : @@ -129,18 +147,22 @@ ZSTD_rescaleFreqs(optState_t* const optPtr, DEBUGLOG(5, "ZSTD_rescaleFreqs (srcSize=3D%u)", (unsigned)srcSize); optPtr->priceType =3D zop_dynamic; =20 - if (optPtr->litLengthSum =3D=3D 0) { /* first block : init */ - if (srcSize <=3D ZSTD_PREDEF_THRESHOLD) { /* heuristic */ - DEBUGLOG(5, "(srcSize <=3D ZSTD_PREDEF_THRESHOLD) =3D> zop_pre= def"); + if (optPtr->litLengthSum =3D=3D 0) { /* no literals stats collected -= > first block assumed -> init */ + + /* heuristic: use pre-defined stats for too small inputs */ + if (srcSize <=3D ZSTD_PREDEF_THRESHOLD) { + DEBUGLOG(5, "srcSize <=3D %i : use predefined stats", ZSTD_PRE= DEF_THRESHOLD); optPtr->priceType =3D zop_predef; } =20 assert(optPtr->symbolCosts !=3D NULL); if (optPtr->symbolCosts->huf.repeatMode =3D=3D HUF_repeat_valid) { - /* huffman table presumed generated by dictionary */ + + /* huffman stats covering the full value set : table presumed = generated by dictionary */ optPtr->priceType =3D zop_dynamic; =20 if (compressedLiterals) { + /* generate literals statistics from huffman table */ unsigned lit; assert(optPtr->litFreq !=3D NULL); optPtr->litSum =3D 0; @@ -188,13 +210,14 @@ ZSTD_rescaleFreqs(optState_t* const optPtr, optPtr->offCodeSum +=3D optPtr->offCodeFreq[of]; } } =20 - } else { /* not a dictionary */ + } else { /* first block, no dictionary */ =20 assert(optPtr->litFreq !=3D NULL); if (compressedLiterals) { + /* base initial cost of literals on direct frequency withi= n src */ unsigned lit =3D MaxLit; HIST_count_simple(optPtr->litFreq, &lit, src, srcSize); = /* use raw first block to init statistics */ - optPtr->litSum =3D ZSTD_downscaleStats(optPtr->litFreq, Ma= xLit, 8); + optPtr->litSum =3D ZSTD_downscaleStats(optPtr->litFreq, Ma= xLit, 8, base_0possible); } =20 { unsigned const baseLLfreqs[MaxLL+1] =3D { @@ -224,10 +247,9 @@ ZSTD_rescaleFreqs(optState_t* const optPtr, optPtr->offCodeSum =3D sum_u32(baseOFCfreqs, MaxOff+1); } =20 - } =20 - } else { /* new block : re-use previous statistics, scaled down */ + } else { /* new block : scale down accumulated statistics */ =20 if (compressedLiterals) optPtr->litSum =3D ZSTD_scaleStats(optPtr->litFreq, MaxLit, 12= ); @@ -246,6 +268,7 @@ static U32 ZSTD_rawLiteralsCost(const BYTE* const liter= als, U32 const litLength, const optState_t* const optPtr, int optLevel) { + DEBUGLOG(8, "ZSTD_rawLiteralsCost (%u literals)", litLength); if (litLength =3D=3D 0) return 0; =20 if (!ZSTD_compressedLiterals(optPtr)) @@ -255,11 +278,14 @@ static U32 ZSTD_rawLiteralsCost(const BYTE* const lit= erals, U32 const litLength, return (litLength*6) * BITCOST_MULTIPLIER; /* 6 bit per literal -= no statistic used */ =20 /* dynamic statistics */ - { U32 price =3D litLength * optPtr->litSumBasePrice; + { U32 price =3D optPtr->litSumBasePrice * litLength; + U32 const litPriceMax =3D optPtr->litSumBasePrice - BITCOST_MULTIP= LIER; U32 u; + assert(optPtr->litSumBasePrice >=3D BITCOST_MULTIPLIER); for (u=3D0; u < litLength; u++) { - assert(WEIGHT(optPtr->litFreq[literals[u]], optLevel) <=3D opt= Ptr->litSumBasePrice); /* literal cost should never be negative */ - price -=3D WEIGHT(optPtr->litFreq[literals[u]], optLevel); + U32 litPrice =3D WEIGHT(optPtr->litFreq[literals[u]], optLevel= ); + if (UNLIKELY(litPrice > litPriceMax)) litPrice =3D litPriceMax; + price -=3D litPrice; } return price; } @@ -272,10 +298,11 @@ static U32 ZSTD_litLengthPrice(U32 const litLength, c= onst optState_t* const optP assert(litLength <=3D ZSTD_BLOCKSIZE_MAX); if (optPtr->priceType =3D=3D zop_predef) return WEIGHT(litLength, optLevel); - /* We can't compute the litLength price for sizes >=3D ZSTD_BLOCKSIZE_= MAX - * because it isn't representable in the zstd format. So instead just - * call it 1 bit more than ZSTD_BLOCKSIZE_MAX - 1. In this case the bl= ock - * would be all literals. + + /* ZSTD_LLcode() can't compute litLength price for sizes >=3D ZSTD_BLO= CKSIZE_MAX + * because it isn't representable in the zstd format. + * So instead just pretend it would cost 1 bit more than ZSTD_BLOCKSIZ= E_MAX - 1. + * In such a case, the block would be all literals. */ if (litLength =3D=3D ZSTD_BLOCKSIZE_MAX) return BITCOST_MULTIPLIER + ZSTD_litLengthPrice(ZSTD_BLOCKSIZE_MAX= - 1, optPtr, optLevel); @@ -289,24 +316,25 @@ static U32 ZSTD_litLengthPrice(U32 const litLength, c= onst optState_t* const optP } =20 /* ZSTD_getMatchPrice() : - * Provides the cost of the match part (offset + matchLength) of a sequence + * Provides the cost of the match part (offset + matchLength) of a sequenc= e. * Must be combined with ZSTD_fullLiteralsCost() to get the full cost of a= sequence. - * @offcode : expects a scale where 0,1,2 are repcodes 1-3, and 3+ are rea= l_offsets+2 + * @offBase : sumtype, representing an offset or a repcode, and using nume= ric representation of ZSTD_storeSeq() * @optLevel: when <2, favors small offset for decompression speed (improv= ed cache efficiency) */ FORCE_INLINE_TEMPLATE U32 -ZSTD_getMatchPrice(U32 const offcode, +ZSTD_getMatchPrice(U32 const offBase, U32 const matchLength, const optState_t* const optPtr, int const optLevel) { U32 price; - U32 const offCode =3D ZSTD_highbit32(STORED_TO_OFFBASE(offcode)); + U32 const offCode =3D ZSTD_highbit32(offBase); U32 const mlBase =3D matchLength - MINMATCH; assert(matchLength >=3D MINMATCH); =20 - if (optPtr->priceType =3D=3D zop_predef) /* fixed scheme, do not use = statistics */ - return WEIGHT(mlBase, optLevel) + ((16 + offCode) * BITCOST_MULTIP= LIER); + if (optPtr->priceType =3D=3D zop_predef) /* fixed scheme, does not us= e statistics */ + return WEIGHT(mlBase, optLevel) + + ((16 + offCode) * BITCOST_MULTIPLIER); /* emulated offset c= ost */ =20 /* dynamic statistics */ price =3D (offCode * BITCOST_MULTIPLIER) + (optPtr->offCodeSumBasePric= e - WEIGHT(optPtr->offCodeFreq[offCode], optLevel)); @@ -325,10 +353,10 @@ ZSTD_getMatchPrice(U32 const offcode, } =20 /* ZSTD_updateStats() : - * assumption : literals + litLengtn <=3D iend */ + * assumption : literals + litLength <=3D iend */ static void ZSTD_updateStats(optState_t* const optPtr, U32 litLength, const BYTE* literals, - U32 offsetCode, U32 matchLength) + U32 offBase, U32 matchLength) { /* literals */ if (ZSTD_compressedLiterals(optPtr)) { @@ -344,8 +372,8 @@ static void ZSTD_updateStats(optState_t* const optPtr, optPtr->litLengthSum++; } =20 - /* offset code : expected to follow storeSeq() numeric representation = */ - { U32 const offCode =3D ZSTD_highbit32(STORED_TO_OFFBASE(offsetCode)= ); + /* offset code : follows storeSeq() numeric representation */ + { U32 const offCode =3D ZSTD_highbit32(offBase); assert(offCode <=3D MaxOff); optPtr->offCodeFreq[offCode]++; optPtr->offCodeSum++; @@ -379,9 +407,11 @@ MEM_STATIC U32 ZSTD_readMINMATCH(const void* memPtr, U= 32 length) =20 /* Update hashTable3 up to ip (excluded) Assumption : always within prefix (i.e. not within extDict) */ -static U32 ZSTD_insertAndFindFirstIndexHash3 (const ZSTD_matchState_t* ms, - U32* nextToUpdate3, - const BYTE* const ip) +static +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR +U32 ZSTD_insertAndFindFirstIndexHash3 (const ZSTD_MatchState_t* ms, + U32* nextToUpdate3, + const BYTE* const ip) { U32* const hashTable3 =3D ms->hashTable3; U32 const hashLog3 =3D ms->hashLog3; @@ -408,8 +438,10 @@ static U32 ZSTD_insertAndFindFirstIndexHash3 (const ZS= TD_matchState_t* ms, * @param ip assumed <=3D iend-8 . * @param target The target of ZSTD_updateTree_internal() - we are filling= to this position * @return : nb of positions added */ -static U32 ZSTD_insertBt1( - const ZSTD_matchState_t* ms, +static +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR +U32 ZSTD_insertBt1( + const ZSTD_MatchState_t* ms, const BYTE* const ip, const BYTE* const iend, U32 const target, U32 const mls, const int extDict) @@ -527,15 +559,16 @@ static U32 ZSTD_insertBt1( } =20 FORCE_INLINE_TEMPLATE +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR void ZSTD_updateTree_internal( - ZSTD_matchState_t* ms, + ZSTD_MatchState_t* ms, const BYTE* const ip, const BYTE* const iend, const U32 mls, const ZSTD_dictMode_e dictMode) { const BYTE* const base =3D ms->window.base; U32 const target =3D (U32)(ip - base); U32 idx =3D ms->nextToUpdate; - DEBUGLOG(6, "ZSTD_updateTree_internal, from %u to %u (dictMode:%u)", + DEBUGLOG(7, "ZSTD_updateTree_internal, from %u to %u (dictMode:%u)", idx, target, dictMode); =20 while(idx < target) { @@ -548,20 +581,23 @@ void ZSTD_updateTree_internal( ms->nextToUpdate =3D target; } =20 -void ZSTD_updateTree(ZSTD_matchState_t* ms, const BYTE* ip, const BYTE* ie= nd) { +void ZSTD_updateTree(ZSTD_MatchState_t* ms, const BYTE* ip, const BYTE* ie= nd) { ZSTD_updateTree_internal(ms, ip, iend, ms->cParams.minMatch, ZSTD_noDi= ct); } =20 FORCE_INLINE_TEMPLATE -U32 ZSTD_insertBtAndGetAllMatches ( - ZSTD_match_t* matches, /* store result (found matche= s) in this table (presumed large enough) */ - ZSTD_matchState_t* ms, - U32* nextToUpdate3, - const BYTE* const ip, const BYTE* const iLimit, const = ZSTD_dictMode_e dictMode, - const U32 rep[ZSTD_REP_NUM], - U32 const ll0, /* tells if associated literal length= is 0 or not. This value must be 0 or 1 */ - const U32 lengthToBeat, - U32 const mls /* template */) +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR +U32 +ZSTD_insertBtAndGetAllMatches ( + ZSTD_match_t* matches, /* store result (found matches) in= this table (presumed large enough) */ + ZSTD_MatchState_t* ms, + U32* nextToUpdate3, + const BYTE* const ip, const BYTE* const iLimit, + const ZSTD_dictMode_e dictMode, + const U32 rep[ZSTD_REP_NUM], + const U32 ll0, /* tells if associated literal length is 0= or not. This value must be 0 or 1 */ + const U32 lengthToBeat, + const U32 mls /* template */) { const ZSTD_compressionParameters* const cParams =3D &ms->cParams; U32 const sufficient_len =3D MIN(cParams->targetLength, ZSTD_OPT_NUM -= 1); @@ -590,7 +626,7 @@ U32 ZSTD_insertBtAndGetAllMatches ( U32 mnum =3D 0; U32 nbCompares =3D 1U << cParams->searchLog; =20 - const ZSTD_matchState_t* dms =3D dictMode =3D=3D ZSTD_dictMatchStat= e ? ms->dictMatchState : NULL; + const ZSTD_MatchState_t* dms =3D dictMode =3D=3D ZSTD_dictMatchStat= e ? ms->dictMatchState : NULL; const ZSTD_compressionParameters* const dmsCParams =3D dictMode =3D=3D ZSTD_dictMatchState = ? &dms->cParams : NULL; const BYTE* const dmsBase =3D dictMode =3D=3D ZSTD_dictMatchStat= e ? dms->window.base : NULL; @@ -629,13 +665,13 @@ U32 ZSTD_insertBtAndGetAllMatches ( assert(curr >=3D windowLow); if ( dictMode =3D=3D ZSTD_extDict && ( ((repOffset-1) /*intentional overflow*/ < curr - wi= ndowLow) /* equivalent to `curr > repIndex >=3D windowLow` */ - & (((U32)((dictLimit-1) - repIndex) >=3D 3) ) /* inte= ntional overflow : do not test positions overlapping 2 memory segments */) + & (ZSTD_index_overlap_check(dictLimit, repIndex)) ) && (ZSTD_readMINMATCH(ip, minMatch) =3D=3D ZSTD_readMINM= ATCH(repMatch, minMatch)) ) { repLen =3D (U32)ZSTD_count_2segments(ip+minMatch, repM= atch+minMatch, iLimit, dictEnd, prefixStart) + minMatch; } if (dictMode =3D=3D ZSTD_dictMatchState && ( ((repOffset-1) /*intentional overflow*/ < curr - (d= msLowLimit + dmsIndexDelta)) /* equivalent to `curr > repIndex >=3D dmsLow= Limit` */ - & ((U32)((dictLimit-1) - repIndex) >=3D 3) ) /* inten= tional overflow : do not test positions overlapping 2 memory segments */ + & (ZSTD_index_overlap_check(dictLimit, repIndex)) ) && (ZSTD_readMINMATCH(ip, minMatch) =3D=3D ZSTD_readMINM= ATCH(repMatch, minMatch)) ) { repLen =3D (U32)ZSTD_count_2segments(ip+minMatch, repM= atch+minMatch, iLimit, dmsEnd, prefixStart) + minMatch; } } @@ -644,7 +680,7 @@ U32 ZSTD_insertBtAndGetAllMatches ( DEBUGLOG(8, "found repCode %u (ll0:%u, offset:%u) of lengt= h %u", repCode, ll0, repOffset, repLen); bestLength =3D repLen; - matches[mnum].off =3D STORE_REPCODE(repCode - ll0 + 1); /= * expect value between 1 and 3 */ + matches[mnum].off =3D REPCODE_TO_OFFBASE(repCode - ll0 + 1= ); /* expect value between 1 and 3 */ matches[mnum].len =3D (U32)repLen; mnum++; if ( (repLen > sufficient_len) @@ -673,7 +709,7 @@ U32 ZSTD_insertBtAndGetAllMatches ( bestLength =3D mlen; assert(curr > matchIndex3); assert(mnum=3D=3D0); /* no prior solution */ - matches[0].off =3D STORE_OFFSET(curr - matchIndex3); + matches[0].off =3D OFFSET_TO_OFFBASE(curr - matchIndex3); matches[0].len =3D (U32)mlen; mnum =3D 1; if ( (mlen > sufficient_len) | @@ -706,13 +742,13 @@ U32 ZSTD_insertBtAndGetAllMatches ( } =20 if (matchLength > bestLength) { - DEBUGLOG(8, "found match of length %u at distance %u (offCode= =3D%u)", - (U32)matchLength, curr - matchIndex, STORE_OFFSET(curr= - matchIndex)); + DEBUGLOG(8, "found match of length %u at distance %u (offBase= =3D%u)", + (U32)matchLength, curr - matchIndex, OFFSET_TO_OFFBASE= (curr - matchIndex)); assert(matchEndIdx > matchIndex); if (matchLength > matchEndIdx - matchIndex) matchEndIdx =3D matchIndex + (U32)matchLength; bestLength =3D matchLength; - matches[mnum].off =3D STORE_OFFSET(curr - matchIndex); + matches[mnum].off =3D OFFSET_TO_OFFBASE(curr - matchIndex); matches[mnum].len =3D (U32)matchLength; mnum++; if ( (matchLength > ZSTD_OPT_NUM) @@ -754,12 +790,12 @@ U32 ZSTD_insertBtAndGetAllMatches ( =20 if (matchLength > bestLength) { matchIndex =3D dictMatchIndex + dmsIndexDelta; - DEBUGLOG(8, "found dms match of length %u at distance %u (= offCode=3D%u)", - (U32)matchLength, curr - matchIndex, STORE_OFFSET(= curr - matchIndex)); + DEBUGLOG(8, "found dms match of length %u at distance %u (= offBase=3D%u)", + (U32)matchLength, curr - matchIndex, OFFSET_TO_OFF= BASE(curr - matchIndex)); if (matchLength > matchEndIdx - matchIndex) matchEndIdx =3D matchIndex + (U32)matchLength; bestLength =3D matchLength; - matches[mnum].off =3D STORE_OFFSET(curr - matchIndex); + matches[mnum].off =3D OFFSET_TO_OFFBASE(curr - matchIndex); matches[mnum].len =3D (U32)matchLength; mnum++; if ( (matchLength > ZSTD_OPT_NUM) @@ -784,7 +820,7 @@ U32 ZSTD_insertBtAndGetAllMatches ( =20 typedef U32 (*ZSTD_getAllMatchesFn)( ZSTD_match_t*, - ZSTD_matchState_t*, + ZSTD_MatchState_t*, U32*, const BYTE*, const BYTE*, @@ -792,9 +828,11 @@ typedef U32 (*ZSTD_getAllMatchesFn)( U32 const ll0, U32 const lengthToBeat); =20 -FORCE_INLINE_TEMPLATE U32 ZSTD_btGetAllMatches_internal( +FORCE_INLINE_TEMPLATE +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR +U32 ZSTD_btGetAllMatches_internal( ZSTD_match_t* matches, - ZSTD_matchState_t* ms, + ZSTD_MatchState_t* ms, U32* nextToUpdate3, const BYTE* ip, const BYTE* const iHighLimit, @@ -817,7 +855,7 @@ FORCE_INLINE_TEMPLATE U32 ZSTD_btGetAllMatches_internal( #define GEN_ZSTD_BT_GET_ALL_MATCHES_(dictMode, mls) \ static U32 ZSTD_BT_GET_ALL_MATCHES_FN(dictMode, mls)( \ ZSTD_match_t* matches, \ - ZSTD_matchState_t* ms, \ + ZSTD_MatchState_t* ms, \ U32* nextToUpdate3, \ const BYTE* ip, \ const BYTE* const iHighLimit, \ @@ -849,7 +887,7 @@ GEN_ZSTD_BT_GET_ALL_MATCHES(dictMatchState) } =20 static ZSTD_getAllMatchesFn -ZSTD_selectBtGetAllMatches(ZSTD_matchState_t const* ms, ZSTD_dictMode_e co= nst dictMode) +ZSTD_selectBtGetAllMatches(ZSTD_MatchState_t const* ms, ZSTD_dictMode_e co= nst dictMode) { ZSTD_getAllMatchesFn const getAllMatchesFns[3][4] =3D { ZSTD_BT_GET_ALL_MATCHES_ARRAY(noDict), @@ -868,7 +906,7 @@ ZSTD_selectBtGetAllMatches(ZSTD_matchState_t const* ms,= ZSTD_dictMode_e const di =20 /* Struct containing info needed to make decision about ldm inclusion */ typedef struct { - rawSeqStore_t seqStore; /* External match candidates store for this = block */ + RawSeqStore_t seqStore; /* External match candidates store for this = block */ U32 startPosInBlock; /* Start position of the current match candi= date */ U32 endPosInBlock; /* End position of the current match candida= te */ U32 offset; /* Offset of the match candidate */ @@ -878,7 +916,7 @@ typedef struct { * Moves forward in @rawSeqStore by @nbBytes, * which will update the fields 'pos' and 'posInSequence'. */ -static void ZSTD_optLdm_skipRawSeqStoreBytes(rawSeqStore_t* rawSeqStore, s= ize_t nbBytes) +static void ZSTD_optLdm_skipRawSeqStoreBytes(RawSeqStore_t* rawSeqStore, s= ize_t nbBytes) { U32 currPos =3D (U32)(rawSeqStore->posInSequence + nbBytes); while (currPos && rawSeqStore->pos < rawSeqStore->size) { @@ -935,7 +973,7 @@ ZSTD_opt_getNextMatchAndUpdateSeqStore(ZSTD_optLdm_t* o= ptLdm, U32 currPosInBlock return; } =20 - /* Matches may be < MINMATCH by this process. In that case, we will re= ject them + /* Matches may be < minMatch by this process. In that case, we will re= ject them when we are deciding whether or not to add the ldm */ optLdm->startPosInBlock =3D currPosInBlock + literalsBytesRemaining; optLdm->endPosInBlock =3D optLdm->startPosInBlock + matchBytesRemainin= g; @@ -957,25 +995,26 @@ ZSTD_opt_getNextMatchAndUpdateSeqStore(ZSTD_optLdm_t*= optLdm, U32 currPosInBlock * into 'matches'. Maintains the correct ordering of 'matches'. */ static void ZSTD_optLdm_maybeAddMatch(ZSTD_match_t* matches, U32* nbMatche= s, - const ZSTD_optLdm_t* optLdm, U32 cur= rPosInBlock) + const ZSTD_optLdm_t* optLdm, U32 cur= rPosInBlock, + U32 minMatch) { U32 const posDiff =3D currPosInBlock - optLdm->startPosInBlock; - /* Note: ZSTD_match_t actually contains offCode and matchLength (befor= e subtracting MINMATCH) */ + /* Note: ZSTD_match_t actually contains offBase and matchLength (befor= e subtracting MINMATCH) */ U32 const candidateMatchLength =3D optLdm->endPosInBlock - optLdm->sta= rtPosInBlock - posDiff; =20 /* Ensure that current block position is not outside of the match */ if (currPosInBlock < optLdm->startPosInBlock || currPosInBlock >=3D optLdm->endPosInBlock - || candidateMatchLength < MINMATCH) { + || candidateMatchLength < minMatch) { return; } =20 if (*nbMatches =3D=3D 0 || ((candidateMatchLength > matches[*nbMatches= -1].len) && *nbMatches < ZSTD_OPT_NUM)) { - U32 const candidateOffCode =3D STORE_OFFSET(optLdm->offset); - DEBUGLOG(6, "ZSTD_optLdm_maybeAddMatch(): Adding ldm candidate mat= ch (offCode: %u matchLength %u) at block position=3D%u", - candidateOffCode, candidateMatchLength, currPosInBlock); + U32 const candidateOffBase =3D OFFSET_TO_OFFBASE(optLdm->offset); + DEBUGLOG(6, "ZSTD_optLdm_maybeAddMatch(): Adding ldm candidate mat= ch (offBase: %u matchLength %u) at block position=3D%u", + candidateOffBase, candidateMatchLength, currPosInBlock); matches[*nbMatches].len =3D candidateMatchLength; - matches[*nbMatches].off =3D candidateOffCode; + matches[*nbMatches].off =3D candidateOffBase; (*nbMatches)++; } } @@ -986,7 +1025,8 @@ static void ZSTD_optLdm_maybeAddMatch(ZSTD_match_t* ma= tches, U32* nbMatches, static void ZSTD_optLdm_processMatchCandidate(ZSTD_optLdm_t* optLdm, ZSTD_match_t* matches, U32* nbMatches, - U32 currPosInBlock, U32 remainingBytes) + U32 currPosInBlock, U32 remainingBytes, + U32 minMatch) { if (optLdm->seqStore.size =3D=3D 0 || optLdm->seqStore.pos >=3D optLdm= ->seqStore.size) { return; @@ -1003,7 +1043,7 @@ ZSTD_optLdm_processMatchCandidate(ZSTD_optLdm_t* optL= dm, } ZSTD_opt_getNextMatchAndUpdateSeqStore(optLdm, currPosInBlock, rem= ainingBytes); } - ZSTD_optLdm_maybeAddMatch(matches, nbMatches, optLdm, currPosInBlock); + ZSTD_optLdm_maybeAddMatch(matches, nbMatches, optLdm, currPosInBlock, = minMatch); } =20 =20 @@ -1011,11 +1051,6 @@ ZSTD_optLdm_processMatchCandidate(ZSTD_optLdm_t* opt= Ldm, * Optimal parser *********************************/ =20 -static U32 ZSTD_totalLen(ZSTD_optimal_t sol) -{ - return sol.litlen + sol.mlen; -} - #if 0 /* debug */ =20 static void @@ -1033,9 +1068,15 @@ listStats(const U32* table, int lastEltID) =20 #endif =20 -FORCE_INLINE_TEMPLATE size_t -ZSTD_compressBlock_opt_generic(ZSTD_matchState_t* ms, - seqStore_t* seqStore, +#define LIT_PRICE(_p) (int)ZSTD_rawLiteralsCost(_p, 1, optStatePtr, optLev= el) +#define LL_PRICE(_l) (int)ZSTD_litLengthPrice(_l, optStatePtr, optLevel) +#define LL_INCPRICE(_l) (LL_PRICE(_l) - LL_PRICE(_l-1)) + +FORCE_INLINE_TEMPLATE +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR +size_t +ZSTD_compressBlock_opt_generic(ZSTD_MatchState_t* ms, + SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], const void* src, size_t srcSize, const int optLevel, @@ -1059,9 +1100,11 @@ ZSTD_compressBlock_opt_generic(ZSTD_matchState_t* ms, =20 ZSTD_optimal_t* const opt =3D optStatePtr->priceTable; ZSTD_match_t* const matches =3D optStatePtr->matchTable; - ZSTD_optimal_t lastSequence; + ZSTD_optimal_t lastStretch; ZSTD_optLdm_t optLdm; =20 + ZSTD_memset(&lastStretch, 0, sizeof(ZSTD_optimal_t)); + optLdm.seqStore =3D ms->ldmSeqStore ? *ms->ldmSeqStore : kNullRawSeqSt= ore; optLdm.endPosInBlock =3D optLdm.startPosInBlock =3D optLdm.offset =3D = 0; ZSTD_opt_getNextMatchAndUpdateSeqStore(&optLdm, (U32)(ip-istart), (U32= )(iend-ip)); @@ -1082,103 +1125,140 @@ ZSTD_compressBlock_opt_generic(ZSTD_matchState_t*= ms, U32 const ll0 =3D !litlen; U32 nbMatches =3D getAllMatches(matches, ms, &nextToUpdate3, i= p, iend, rep, ll0, minMatch); ZSTD_optLdm_processMatchCandidate(&optLdm, matches, &nbMatches, - (U32)(ip-istart), (U32)(iend= - ip)); - if (!nbMatches) { ip++; continue; } + (U32)(ip-istart), (U32)(iend= -ip), + minMatch); + if (!nbMatches) { + DEBUGLOG(8, "no match found at cPos %u", (unsigned)(ip-ist= art)); + ip++; + continue; + } + + /* Match found: let's store this solution, and eventually find= more candidates. + * During this forward pass, @opt is used to store stretches, + * defined as "a match followed by N literals". + * Note how this is different from a Sequence, which is "N lit= erals followed by a match". + * Storing stretches allows us to store different match predec= essors + * for each literal position part of a literals run. */ =20 /* initialize opt[0] */ - { U32 i ; for (i=3D0; i immediate encoding */ { U32 const maxML =3D matches[nbMatches-1].len; - U32 const maxOffcode =3D matches[nbMatches-1].off; - DEBUGLOG(6, "found %u matches of maxLength=3D%u and maxOff= Code=3D%u at cPos=3D%u =3D> start new series", - nbMatches, maxML, maxOffcode, (U32)(ip-prefixS= tart)); + U32 const maxOffBase =3D matches[nbMatches-1].off; + DEBUGLOG(6, "found %u matches of maxLength=3D%u and maxOff= Base=3D%u at cPos=3D%u =3D> start new series", + nbMatches, maxML, maxOffBase, (U32)(ip-prefixS= tart)); =20 if (maxML > sufficient_len) { - lastSequence.litlen =3D litlen; - lastSequence.mlen =3D maxML; - lastSequence.off =3D maxOffcode; - DEBUGLOG(6, "large match (%u>%u), immediate encoding", + lastStretch.litlen =3D 0; + lastStretch.mlen =3D maxML; + lastStretch.off =3D maxOffBase; + DEBUGLOG(6, "large match (%u>%u) =3D> immediate encodi= ng", maxML, sufficient_len); cur =3D 0; - last_pos =3D ZSTD_totalLen(lastSequence); + last_pos =3D maxML; goto _shortestPath; } } =20 /* set prices for first matches starting position =3D=3D 0 */ assert(opt[0].price >=3D 0); - { U32 const literalsPrice =3D (U32)opt[0].price + ZSTD_litLe= ngthPrice(0, optStatePtr, optLevel); - U32 pos; + { U32 pos; U32 matchNb; for (pos =3D 1; pos < minMatch; pos++) { - opt[pos].price =3D ZSTD_MAX_PRICE; /* mlen, litlen a= nd price will be fixed during forward scanning */ + opt[pos].price =3D ZSTD_MAX_PRICE; + opt[pos].mlen =3D 0; + opt[pos].litlen =3D litlen + pos; } for (matchNb =3D 0; matchNb < nbMatches; matchNb++) { - U32 const offcode =3D matches[matchNb].off; + U32 const offBase =3D matches[matchNb].off; U32 const end =3D matches[matchNb].len; for ( ; pos <=3D end ; pos++ ) { - U32 const matchPrice =3D ZSTD_getMatchPrice(offcod= e, pos, optStatePtr, optLevel); - U32 const sequencePrice =3D literalsPrice + matchP= rice; + int const matchPrice =3D (int)ZSTD_getMatchPrice(o= ffBase, pos, optStatePtr, optLevel); + int const sequencePrice =3D opt[0].price + matchPr= ice; DEBUGLOG(7, "rPos:%u =3D> set initial price : %.2f= ", pos, ZSTD_fCost(sequencePrice)); opt[pos].mlen =3D pos; - opt[pos].off =3D offcode; - opt[pos].litlen =3D litlen; - opt[pos].price =3D (int)sequencePrice; - } } + opt[pos].off =3D offBase; + opt[pos].litlen =3D 0; /* end of match */ + opt[pos].price =3D sequencePrice + LL_PRICE(0); + } + } last_pos =3D pos-1; + opt[pos].price =3D ZSTD_MAX_PRICE; } } =20 /* check further positions */ for (cur =3D 1; cur <=3D last_pos; cur++) { const BYTE* const inr =3D ip + cur; - assert(cur < ZSTD_OPT_NUM); - DEBUGLOG(7, "cPos:%zi=3D=3DrPos:%u", inr-istart, cur) + assert(cur <=3D ZSTD_OPT_NUM); + DEBUGLOG(7, "cPos:%i=3D=3DrPos:%u", (int)(inr-istart), cur); =20 /* Fix current position with one literal if cheaper */ - { U32 const litlen =3D (opt[cur-1].mlen =3D=3D 0) ? opt[cur-= 1].litlen + 1 : 1; + { U32 const litlen =3D opt[cur-1].litlen + 1; int const price =3D opt[cur-1].price - + (int)ZSTD_rawLiteralsCost(ip+cur-1, 1, o= ptStatePtr, optLevel) - + (int)ZSTD_litLengthPrice(litlen, optStat= ePtr, optLevel) - - (int)ZSTD_litLengthPrice(litlen-1, optSt= atePtr, optLevel); + + LIT_PRICE(ip+cur-1) + + LL_INCPRICE(litlen); assert(price < 1000000000); /* overflow check */ if (price <=3D opt[cur].price) { - DEBUGLOG(7, "cPos:%zi=3D=3DrPos:%u : better price (%.2= f<=3D%.2f) using literal (ll=3D=3D%u) (hist:%u,%u,%u)", - inr-istart, cur, ZSTD_fCost(price), ZSTD_f= Cost(opt[cur].price), litlen, + ZSTD_optimal_t const prevMatch =3D opt[cur]; + DEBUGLOG(7, "cPos:%i=3D=3DrPos:%u : better price (%.2f= <=3D%.2f) using literal (ll=3D=3D%u) (hist:%u,%u,%u)", + (int)(inr-istart), cur, ZSTD_fCost(price),= ZSTD_fCost(opt[cur].price), litlen, opt[cur-1].rep[0], opt[cur-1].rep[1], opt[= cur-1].rep[2]); - opt[cur].mlen =3D 0; - opt[cur].off =3D 0; + opt[cur] =3D opt[cur-1]; opt[cur].litlen =3D litlen; opt[cur].price =3D price; + if ( (optLevel >=3D 1) /* additional check only for hi= gher modes */ + && (prevMatch.litlen =3D=3D 0) /* replace a match */ + && (LL_INCPRICE(1) < 0) /* ll1 is cheaper than ll0 */ + && LIKELY(ip + cur < iend) + ) { + /* check next position, in case it would be cheape= r */ + int with1literal =3D prevMatch.price + LIT_PRICE(i= p+cur) + LL_INCPRICE(1); + int withMoreLiterals =3D price + LIT_PRICE(ip+cur)= + LL_INCPRICE(litlen+1); + DEBUGLOG(7, "then at next rPos %u : match+1lit %.2= f vs %ulits %.2f", + cur+1, ZSTD_fCost(with1literal), litlen+1,= ZSTD_fCost(withMoreLiterals)); + if ( (with1literal < withMoreLiterals) + && (with1literal < opt[cur+1].price) ) { + /* update offset history - before it disappear= s */ + U32 const prev =3D cur - prevMatch.mlen; + Repcodes_t const newReps =3D ZSTD_newRep(opt[p= rev].rep, prevMatch.off, opt[prev].litlen=3D=3D0); + assert(cur >=3D prevMatch.mlen); + DEBUGLOG(7, "=3D=3D> match+1lit is cheaper (%.= 2f < %.2f) (hist:%u,%u,%u) !", + ZSTD_fCost(with1literal), ZSTD_fCo= st(withMoreLiterals), + newReps.rep[0], newReps.rep[1], ne= wReps.rep[2] ); + opt[cur+1] =3D prevMatch; /* mlen & offbase */ + ZSTD_memcpy(opt[cur+1].rep, &newReps, sizeof(R= epcodes_t)); + opt[cur+1].litlen =3D 1; + opt[cur+1].price =3D with1literal; + if (last_pos < cur+1) last_pos =3D cur+1; + } + } } else { - DEBUGLOG(7, "cPos:%zi=3D=3DrPos:%u : literal would cos= t more (%.2f>%.2f) (hist:%u,%u,%u)", - inr-istart, cur, ZSTD_fCost(price), ZSTD_f= Cost(opt[cur].price), - opt[cur].rep[0], opt[cur].rep[1], opt[cur]= .rep[2]); + DEBUGLOG(7, "cPos:%i=3D=3DrPos:%u : literal would cost= more (%.2f>%.2f)", + (int)(inr-istart), cur, ZSTD_fCost(price),= ZSTD_fCost(opt[cur].price)); } } =20 - /* Set the repcodes of the current position. We must do it here - * because we rely on the repcodes of the 2nd to last sequence= being - * correct to set the next chunks repcodes during the backward - * traversal. + /* Offset history is not updated during match comparison. + * Do it here, now that the match is selected and confirmed. */ - ZSTD_STATIC_ASSERT(sizeof(opt[cur].rep) =3D=3D sizeof(repcodes= _t)); + ZSTD_STATIC_ASSERT(sizeof(opt[cur].rep) =3D=3D sizeof(Repcodes= _t)); assert(cur >=3D opt[cur].mlen); - if (opt[cur].mlen !=3D 0) { + if (opt[cur].litlen =3D=3D 0) { + /* just finished a match =3D> alter offset history */ U32 const prev =3D cur - opt[cur].mlen; - repcodes_t const newReps =3D ZSTD_newRep(opt[prev].rep, op= t[cur].off, opt[cur].litlen=3D=3D0); - ZSTD_memcpy(opt[cur].rep, &newReps, sizeof(repcodes_t)); - } else { - ZSTD_memcpy(opt[cur].rep, opt[cur - 1].rep, sizeof(repcode= s_t)); + Repcodes_t const newReps =3D ZSTD_newRep(opt[prev].rep, op= t[cur].off, opt[prev].litlen=3D=3D0); + ZSTD_memcpy(opt[cur].rep, &newReps, sizeof(Repcodes_t)); } =20 /* last match must start at a minimum distance of 8 from oend = */ @@ -1188,38 +1268,37 @@ ZSTD_compressBlock_opt_generic(ZSTD_matchState_t* m= s, =20 if ( (optLevel=3D=3D0) /*static_test*/ && (opt[cur+1].price <=3D opt[cur].price + (BITCOST_MULTIPLI= ER/2)) ) { - DEBUGLOG(7, "move to next rPos:%u : price is <=3D", cur+1); + DEBUGLOG(7, "skip current position : next rPos(%u) price i= s cheaper", cur+1); continue; /* skip unpromising positions; about ~+6% speed= , -0.01 ratio */ } =20 assert(opt[cur].price >=3D 0); - { U32 const ll0 =3D (opt[cur].mlen !=3D 0); - U32 const litlen =3D (opt[cur].mlen =3D=3D 0) ? opt[cur].l= itlen : 0; - U32 const previousPrice =3D (U32)opt[cur].price; - U32 const basePrice =3D previousPrice + ZSTD_litLengthPric= e(0, optStatePtr, optLevel); + { U32 const ll0 =3D (opt[cur].litlen =3D=3D 0); + int const previousPrice =3D opt[cur].price; + int const basePrice =3D previousPrice + LL_PRICE(0); U32 nbMatches =3D getAllMatches(matches, ms, &nextToUpdate= 3, inr, iend, opt[cur].rep, ll0, minMatch); U32 matchNb; =20 ZSTD_optLdm_processMatchCandidate(&optLdm, matches, &nbMat= ches, - (U32)(inr-istart), (U32)= (iend-inr)); + (U32)(inr-istart), (U32)= (iend-inr), + minMatch); =20 if (!nbMatches) { DEBUGLOG(7, "rPos:%u : no match found", cur); continue; } =20 - { U32 const maxML =3D matches[nbMatches-1].len; - DEBUGLOG(7, "cPos:%zi=3D=3DrPos:%u, found %u matches, = of maxLength=3D%u", - inr-istart, cur, nbMatches, maxML); - - if ( (maxML > sufficient_len) - || (cur + maxML >=3D ZSTD_OPT_NUM) ) { - lastSequence.mlen =3D maxML; - lastSequence.off =3D matches[nbMatches-1].off; - lastSequence.litlen =3D litlen; - cur -=3D (opt[cur].mlen=3D=3D0) ? opt[cur].litlen = : 0; /* last sequence is actually only literals, fix cur to last match - n= ote : may underflow, in which case, it's first sequence, and it's okay */ - last_pos =3D cur + ZSTD_totalLen(lastSequence); - if (cur > ZSTD_OPT_NUM) cur =3D 0; /* underflow = =3D> first match */ + { U32 const longestML =3D matches[nbMatches-1].len; + DEBUGLOG(7, "cPos:%i=3D=3DrPos:%u, found %u matches, o= f longest ML=3D%u", + (int)(inr-istart), cur, nbMatches, longest= ML); + + if ( (longestML > sufficient_len) + || (cur + longestML >=3D ZSTD_OPT_NUM) + || (ip + cur + longestML >=3D iend) ) { + lastStretch.mlen =3D longestML; + lastStretch.off =3D matches[nbMatches-1].off; + lastStretch.litlen =3D 0; + last_pos =3D cur + longestML; goto _shortestPath; } } =20 @@ -1230,20 +1309,25 @@ ZSTD_compressBlock_opt_generic(ZSTD_matchState_t* m= s, U32 const startML =3D (matchNb>0) ? matches[matchNb-1]= .len+1 : minMatch; U32 mlen; =20 - DEBUGLOG(7, "testing match %u =3D> offCode=3D%4u, mlen= =3D%2u, llen=3D%2u", - matchNb, matches[matchNb].off, lastML, lit= len); + DEBUGLOG(7, "testing match %u =3D> offBase=3D%4u, mlen= =3D%2u, llen=3D%2u", + matchNb, matches[matchNb].off, lastML, opt= [cur].litlen); =20 for (mlen =3D lastML; mlen >=3D startML; mlen--) { /*= scan downward */ U32 const pos =3D cur + mlen; - int const price =3D (int)basePrice + (int)ZSTD_get= MatchPrice(offset, mlen, optStatePtr, optLevel); + int const price =3D basePrice + (int)ZSTD_getMatch= Price(offset, mlen, optStatePtr, optLevel); =20 if ((pos > last_pos) || (price < opt[pos].price)) { DEBUGLOG(7, "rPos:%u (ml=3D%2u) =3D> new bette= r price (%.2f<%.2f)", pos, mlen, ZSTD_fCost(price), ZSTD= _fCost(opt[pos].price)); - while (last_pos < pos) { opt[last_pos+1].price= =3D ZSTD_MAX_PRICE; last_pos++; } /* fill empty positions */ + while (last_pos < pos) { + /* fill empty positions, for future compar= isons */ + last_pos++; + opt[last_pos].price =3D ZSTD_MAX_PRICE; + opt[last_pos].litlen =3D !0; /* just need= s to be !=3D 0, to mean "not an end of match" */ + } opt[pos].mlen =3D mlen; opt[pos].off =3D offset; - opt[pos].litlen =3D litlen; + opt[pos].litlen =3D 0; opt[pos].price =3D price; } else { DEBUGLOG(7, "rPos:%u (ml=3D%2u) =3D> new price= is worse (%.2f>=3D%.2f)", @@ -1251,55 +1335,89 @@ ZSTD_compressBlock_opt_generic(ZSTD_matchState_t* m= s, if (optLevel=3D=3D0) break; /* early update a= bort; gets ~+10% speed for about -0.01 ratio loss */ } } } } + opt[last_pos+1].price =3D ZSTD_MAX_PRICE; } /* for (cur =3D 1; cur <=3D last_pos; cur++) */ =20 - lastSequence =3D opt[last_pos]; - cur =3D last_pos > ZSTD_totalLen(lastSequence) ? last_pos - ZSTD_t= otalLen(lastSequence) : 0; /* single sequence, and it starts before `ip` */ - assert(cur < ZSTD_OPT_NUM); /* control overflow*/ + lastStretch =3D opt[last_pos]; + assert(cur >=3D lastStretch.mlen); + cur =3D last_pos - lastStretch.mlen; =20 _shortestPath: /* cur, last_pos, best_mlen, best_off have to be set */ assert(opt[0].mlen =3D=3D 0); + assert(last_pos >=3D lastStretch.mlen); + assert(cur =3D=3D last_pos - lastStretch.mlen); =20 - /* Set the next chunk's repcodes based on the repcodes of the begi= nning - * of the last match, and the last sequence. This avoids us having= to - * update them while traversing the sequences. - */ - if (lastSequence.mlen !=3D 0) { - repcodes_t const reps =3D ZSTD_newRep(opt[cur].rep, lastSequen= ce.off, lastSequence.litlen=3D=3D0); - ZSTD_memcpy(rep, &reps, sizeof(reps)); + if (lastStretch.mlen=3D=3D0) { + /* no solution : all matches have been converted into literals= */ + assert(lastStretch.litlen =3D=3D (ip - anchor) + last_pos); + ip +=3D last_pos; + continue; + } + assert(lastStretch.off > 0); + + /* Update offset history */ + if (lastStretch.litlen =3D=3D 0) { + /* finishing on a match : update offset history */ + Repcodes_t const reps =3D ZSTD_newRep(opt[cur].rep, lastStretc= h.off, opt[cur].litlen=3D=3D0); + ZSTD_memcpy(rep, &reps, sizeof(Repcodes_t)); } else { - ZSTD_memcpy(rep, opt[cur].rep, sizeof(repcodes_t)); + ZSTD_memcpy(rep, lastStretch.rep, sizeof(Repcodes_t)); + assert(cur >=3D lastStretch.litlen); + cur -=3D lastStretch.litlen; } =20 - { U32 const storeEnd =3D cur + 1; + /* Let's write the shortest path solution. + * It is stored in @opt in reverse order, + * starting from @storeEnd (=3D=3Dcur+2), + * effectively partially @opt overwriting. + * Content is changed too: + * - So far, @opt stored stretches, aka a match followed by litera= ls + * - Now, it will store sequences, aka literals followed by a match + */ + { U32 const storeEnd =3D cur + 2; U32 storeStart =3D storeEnd; - U32 seqPos =3D cur; + U32 stretchPos =3D cur; =20 DEBUGLOG(6, "start reverse traversal (last_pos:%u, cur:%u)", last_pos, cur); (void)last_pos; - assert(storeEnd < ZSTD_OPT_NUM); - DEBUGLOG(6, "last sequence copied into pos=3D%u (llen=3D%u,mle= n=3D%u,ofc=3D%u)", - storeEnd, lastSequence.litlen, lastSequence.mlen, = lastSequence.off); - opt[storeEnd] =3D lastSequence; - while (seqPos > 0) { - U32 const backDist =3D ZSTD_totalLen(opt[seqPos]); + assert(storeEnd < ZSTD_OPT_SIZE); + DEBUGLOG(6, "last stretch copied into pos=3D%u (llen=3D%u,mlen= =3D%u,ofc=3D%u)", + storeEnd, lastStretch.litlen, lastStretch.mlen, la= stStretch.off); + if (lastStretch.litlen > 0) { + /* last "sequence" is unfinished: just a bunch of literals= */ + opt[storeEnd].litlen =3D lastStretch.litlen; + opt[storeEnd].mlen =3D 0; + storeStart =3D storeEnd-1; + opt[storeStart] =3D lastStretch; + } { + opt[storeEnd] =3D lastStretch; /* note: litlen will be fi= xed */ + storeStart =3D storeEnd; + } + while (1) { + ZSTD_optimal_t nextStretch =3D opt[stretchPos]; + opt[storeStart].litlen =3D nextStretch.litlen; + DEBUGLOG(6, "selected sequence (llen=3D%u,mlen=3D%u,ofc=3D= %u)", + opt[storeStart].litlen, opt[storeStart].mlen, = opt[storeStart].off); + if (nextStretch.mlen =3D=3D 0) { + /* reaching beginning of segment */ + break; + } storeStart--; - DEBUGLOG(6, "sequence from rPos=3D%u copied into pos=3D%u = (llen=3D%u,mlen=3D%u,ofc=3D%u)", - seqPos, storeStart, opt[seqPos].litlen, opt[se= qPos].mlen, opt[seqPos].off); - opt[storeStart] =3D opt[seqPos]; - seqPos =3D (seqPos > backDist) ? seqPos - backDist : 0; + opt[storeStart] =3D nextStretch; /* note: litlen will be f= ixed */ + assert(nextStretch.litlen + nextStretch.mlen <=3D stretchP= os); + stretchPos -=3D nextStretch.litlen + nextStretch.mlen; } =20 /* save sequences */ - DEBUGLOG(6, "sending selected sequences into seqStore") + DEBUGLOG(6, "sending selected sequences into seqStore"); { U32 storePos; for (storePos=3DstoreStart; storePos <=3D storeEnd; storeP= os++) { U32 const llen =3D opt[storePos].litlen; U32 const mlen =3D opt[storePos].mlen; - U32 const offCode =3D opt[storePos].off; + U32 const offBase =3D opt[storePos].off; U32 const advance =3D llen + mlen; - DEBUGLOG(6, "considering seq starting at %zi, llen=3D%= u, mlen=3D%u", - anchor - istart, (unsigned)llen, (unsigned= )mlen); + DEBUGLOG(6, "considering seq starting at %i, llen=3D%u= , mlen=3D%u", + (int)(anchor - istart), (unsigned)llen, (u= nsigned)mlen); =20 if (mlen=3D=3D0) { /* only literals =3D> must be last= "sequence", actually starting a new stream of sequences */ assert(storePos =3D=3D storeEnd); /* must be las= t sequence */ @@ -1308,11 +1426,14 @@ ZSTD_compressBlock_opt_generic(ZSTD_matchState_t* m= s, } =20 assert(anchor + llen <=3D iend); - ZSTD_updateStats(optStatePtr, llen, anchor, offCode, m= len); - ZSTD_storeSeq(seqStore, llen, anchor, iend, offCode, m= len); + ZSTD_updateStats(optStatePtr, llen, anchor, offBase, m= len); + ZSTD_storeSeq(seqStore, llen, anchor, iend, offBase, m= len); anchor +=3D advance; ip =3D anchor; } } + DEBUGLOG(7, "new offset history : %u, %u, %u", rep[0], rep[1],= rep[2]); + + /* update all costs */ ZSTD_setBasePrices(optStatePtr, optLevel); } } /* while (ip < ilimit) */ @@ -1320,42 +1441,51 @@ ZSTD_compressBlock_opt_generic(ZSTD_matchState_t* m= s, /* Return the last literals size */ return (size_t)(iend - anchor); } +#endif /* build exclusions */ =20 +#ifndef ZSTD_EXCLUDE_BTOPT_BLOCK_COMPRESSOR static size_t ZSTD_compressBlock_opt0( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], const void* src, size_t srcSize, const ZSTD_dictMode_e dictMode) { return ZSTD_compressBlock_opt_generic(ms, seqStore, rep, src, srcSize,= 0 /* optLevel */, dictMode); } +#endif =20 +#ifndef ZSTD_EXCLUDE_BTULTRA_BLOCK_COMPRESSOR static size_t ZSTD_compressBlock_opt2( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], const void* src, size_t srcSize, const ZSTD_dictMode_e dictMode) { return ZSTD_compressBlock_opt_generic(ms, seqStore, rep, src, srcSize,= 2 /* optLevel */, dictMode); } +#endif =20 +#ifndef ZSTD_EXCLUDE_BTOPT_BLOCK_COMPRESSOR size_t ZSTD_compressBlock_btopt( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], const void* src, size_t srcSize) { DEBUGLOG(5, "ZSTD_compressBlock_btopt"); return ZSTD_compressBlock_opt0(ms, seqStore, rep, src, srcSize, ZSTD_n= oDict); } +#endif =20 =20 =20 =20 +#ifndef ZSTD_EXCLUDE_BTULTRA_BLOCK_COMPRESSOR /* ZSTD_initStats_ultra(): * make a first compression pass, just to seed stats with more accurate st= arting values. * only works on first block, with no dictionary and no ldm. - * this function cannot error, hence its contract must be respected. + * this function cannot error out, its narrow contract must be respected. */ -static void -ZSTD_initStats_ultra(ZSTD_matchState_t* ms, - seqStore_t* seqStore, - U32 rep[ZSTD_REP_NUM], - const void* src, size_t srcSize) +static +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR +void ZSTD_initStats_ultra(ZSTD_MatchState_t* ms, + SeqStore_t* seqStore, + U32 rep[ZSTD_REP_NUM], + const void* src, size_t srcSize) { U32 tmpRep[ZSTD_REP_NUM]; /* updated rep codes will sink here */ ZSTD_memcpy(tmpRep, rep, sizeof(tmpRep)); @@ -1368,7 +1498,7 @@ ZSTD_initStats_ultra(ZSTD_matchState_t* ms, =20 ZSTD_compressBlock_opt2(ms, seqStore, tmpRep, src, srcSize, ZSTD_noDic= t); /* generate stats into ms->opt*/ =20 - /* invalidate first scan from history */ + /* invalidate first scan from history, only keep entropy stats */ ZSTD_resetSeqStore(seqStore); ms->window.base -=3D srcSize; ms->window.dictLimit +=3D (U32)srcSize; @@ -1378,7 +1508,7 @@ ZSTD_initStats_ultra(ZSTD_matchState_t* ms, } =20 size_t ZSTD_compressBlock_btultra( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], const void* src, size_t srcSize) { DEBUGLOG(5, "ZSTD_compressBlock_btultra (srcSize=3D%zu)", srcSize); @@ -1386,16 +1516,16 @@ size_t ZSTD_compressBlock_btultra( } =20 size_t ZSTD_compressBlock_btultra2( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], const void* src, size_t srcSize) { U32 const curr =3D (U32)((const BYTE*)src - ms->window.base); DEBUGLOG(5, "ZSTD_compressBlock_btultra2 (srcSize=3D%zu)", srcSize); =20 - /* 2-pass strategy: + /* 2-passes strategy: * this strategy makes a first pass over first block to collect statis= tics - * and seed next round's statistics with it. - * After 1st pass, function forgets everything, and starts a new block. + * in order to seed next round's statistics with it. + * After 1st pass, function forgets history, and starts a new block. * Consequently, this can only work if no data has been previously loa= ded in tables, * aka, no dictionary, no prefix, no ldm preprocessing. * The compression ratio gain is generally small (~0.5% on first block= ), @@ -1404,42 +1534,47 @@ size_t ZSTD_compressBlock_btultra2( if ( (ms->opt.litLengthSum=3D=3D0) /* first block */ && (seqStore->sequences =3D=3D seqStore->sequencesStart) /* no ldm = */ && (ms->window.dictLimit =3D=3D ms->window.lowLimit) /* no diction= ary */ - && (curr =3D=3D ms->window.dictLimit) /* start of frame, nothing a= lready loaded nor skipped */ - && (srcSize > ZSTD_PREDEF_THRESHOLD) + && (curr =3D=3D ms->window.dictLimit) /* start of frame, nothing = already loaded nor skipped */ + && (srcSize > ZSTD_PREDEF_THRESHOLD) /* input large enough to not em= ploy default stats */ ) { ZSTD_initStats_ultra(ms, seqStore, rep, src, srcSize); } =20 return ZSTD_compressBlock_opt2(ms, seqStore, rep, src, srcSize, ZSTD_n= oDict); } +#endif =20 +#ifndef ZSTD_EXCLUDE_BTOPT_BLOCK_COMPRESSOR size_t ZSTD_compressBlock_btopt_dictMatchState( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], const void* src, size_t srcSize) { return ZSTD_compressBlock_opt0(ms, seqStore, rep, src, srcSize, ZSTD_d= ictMatchState); } =20 -size_t ZSTD_compressBlock_btultra_dictMatchState( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_btopt_extDict( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], const void* src, size_t srcSize) { - return ZSTD_compressBlock_opt2(ms, seqStore, rep, src, srcSize, ZSTD_d= ictMatchState); + return ZSTD_compressBlock_opt0(ms, seqStore, rep, src, srcSize, ZSTD_e= xtDict); } +#endif =20 -size_t ZSTD_compressBlock_btopt_extDict( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +#ifndef ZSTD_EXCLUDE_BTULTRA_BLOCK_COMPRESSOR +size_t ZSTD_compressBlock_btultra_dictMatchState( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], const void* src, size_t srcSize) { - return ZSTD_compressBlock_opt0(ms, seqStore, rep, src, srcSize, ZSTD_e= xtDict); + return ZSTD_compressBlock_opt2(ms, seqStore, rep, src, srcSize, ZSTD_d= ictMatchState); } =20 size_t ZSTD_compressBlock_btultra_extDict( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], const void* src, size_t srcSize) { return ZSTD_compressBlock_opt2(ms, seqStore, rep, src, srcSize, ZSTD_e= xtDict); } +#endif =20 /* note : no btultra2 variant for extDict nor dictMatchState, * because btultra2 is not meant to work with dictionaries diff --git a/lib/zstd/compress/zstd_opt.h b/lib/zstd/compress/zstd_opt.h index 22b862858ba7..fbdc540ec9d1 100644 --- a/lib/zstd/compress/zstd_opt.h +++ b/lib/zstd/compress/zstd_opt.h @@ -1,5 +1,6 @@ +/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */ /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -11,40 +12,62 @@ #ifndef ZSTD_OPT_H #define ZSTD_OPT_H =20 - #include "zstd_compress_internal.h" =20 +#if !defined(ZSTD_EXCLUDE_BTLAZY2_BLOCK_COMPRESSOR) \ + || !defined(ZSTD_EXCLUDE_BTOPT_BLOCK_COMPRESSOR) \ + || !defined(ZSTD_EXCLUDE_BTULTRA_BLOCK_COMPRESSOR) /* used in ZSTD_loadDictionaryContent() */ -void ZSTD_updateTree(ZSTD_matchState_t* ms, const BYTE* ip, const BYTE* ie= nd); +void ZSTD_updateTree(ZSTD_MatchState_t* ms, const BYTE* ip, const BYTE* ie= nd); +#endif =20 +#ifndef ZSTD_EXCLUDE_BTOPT_BLOCK_COMPRESSOR size_t ZSTD_compressBlock_btopt( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); -size_t ZSTD_compressBlock_btultra( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_btopt_dictMatchState( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); -size_t ZSTD_compressBlock_btultra2( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +size_t ZSTD_compressBlock_btopt_extDict( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); =20 +#define ZSTD_COMPRESSBLOCK_BTOPT ZSTD_compressBlock_btopt +#define ZSTD_COMPRESSBLOCK_BTOPT_DICTMATCHSTATE ZSTD_compressBlock_btopt_d= ictMatchState +#define ZSTD_COMPRESSBLOCK_BTOPT_EXTDICT ZSTD_compressBlock_btopt_extDict +#else +#define ZSTD_COMPRESSBLOCK_BTOPT NULL +#define ZSTD_COMPRESSBLOCK_BTOPT_DICTMATCHSTATE NULL +#define ZSTD_COMPRESSBLOCK_BTOPT_EXTDICT NULL +#endif =20 -size_t ZSTD_compressBlock_btopt_dictMatchState( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], +#ifndef ZSTD_EXCLUDE_BTULTRA_BLOCK_COMPRESSOR +size_t ZSTD_compressBlock_btultra( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); size_t ZSTD_compressBlock_btultra_dictMatchState( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], - void const* src, size_t srcSize); - -size_t ZSTD_compressBlock_btopt_extDict( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); size_t ZSTD_compressBlock_btultra_extDict( - ZSTD_matchState_t* ms, seqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], void const* src, size_t srcSize); =20 /* note : no btultra2 variant for extDict nor dictMatchState, * because btultra2 is not meant to work with dictionaries * and is only specific for the first block (no prefix) */ +size_t ZSTD_compressBlock_btultra2( + ZSTD_MatchState_t* ms, SeqStore_t* seqStore, U32 rep[ZSTD_REP_NUM], + void const* src, size_t srcSize); =20 +#define ZSTD_COMPRESSBLOCK_BTULTRA ZSTD_compressBlock_btultra +#define ZSTD_COMPRESSBLOCK_BTULTRA_DICTMATCHSTATE ZSTD_compressBlock_btult= ra_dictMatchState +#define ZSTD_COMPRESSBLOCK_BTULTRA_EXTDICT ZSTD_compressBlock_btultra_extD= ict +#define ZSTD_COMPRESSBLOCK_BTULTRA2 ZSTD_compressBlock_btultra2 +#else +#define ZSTD_COMPRESSBLOCK_BTULTRA NULL +#define ZSTD_COMPRESSBLOCK_BTULTRA_DICTMATCHSTATE NULL +#define ZSTD_COMPRESSBLOCK_BTULTRA_EXTDICT NULL +#define ZSTD_COMPRESSBLOCK_BTULTRA2 NULL +#endif =20 #endif /* ZSTD_OPT_H */ diff --git a/lib/zstd/compress/zstd_preSplit.c b/lib/zstd/compress/zstd_pre= Split.c new file mode 100644 index 000000000000..7d9403c9a3bc --- /dev/null +++ b/lib/zstd/compress/zstd_preSplit.c @@ -0,0 +1,239 @@ +// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause +/* + * Copyright (c) Meta Platforms, Inc. and affiliates. + * All rights reserved. + * + * This source code is licensed under both the BSD-style license (found in= the + * LICENSE file in the root directory of this source tree) and the GPLv2 (= found + * in the COPYING file in the root directory of this source tree). + * You may select, at your option, one of the above-listed licenses. + */ + +#include "../common/compiler.h" /* ZSTD_ALIGNOF */ +#include "../common/mem.h" /* S64 */ +#include "../common/zstd_deps.h" /* ZSTD_memset */ +#include "../common/zstd_internal.h" /* ZSTD_STATIC_ASSERT */ +#include "hist.h" /* HIST_add */ +#include "zstd_preSplit.h" + + +#define BLOCKSIZE_MIN 3500 +#define THRESHOLD_PENALTY_RATE 16 +#define THRESHOLD_BASE (THRESHOLD_PENALTY_RATE - 2) +#define THRESHOLD_PENALTY 3 + +#define HASHLENGTH 2 +#define HASHLOG_MAX 10 +#define HASHTABLESIZE (1 << HASHLOG_MAX) +#define HASHMASK (HASHTABLESIZE - 1) +#define KNUTH 0x9e3779b9 + +/* for hashLog > 8, hash 2 bytes. + * for hashLog =3D=3D 8, just take the byte, no hashing. + * The speed of this method relies on compile-time constant propagation */ +FORCE_INLINE_TEMPLATE unsigned hash2(const void *p, unsigned hashLog) +{ + assert(hashLog >=3D 8); + if (hashLog =3D=3D 8) return (U32)((const BYTE*)p)[0]; + assert(hashLog <=3D HASHLOG_MAX); + return (U32)(MEM_read16(p)) * KNUTH >> (32 - hashLog); +} + + +typedef struct { + unsigned events[HASHTABLESIZE]; + size_t nbEvents; +} Fingerprint; +typedef struct { + Fingerprint pastEvents; + Fingerprint newEvents; +} FPStats; + +static void initStats(FPStats* fpstats) +{ + ZSTD_memset(fpstats, 0, sizeof(FPStats)); +} + +FORCE_INLINE_TEMPLATE void +addEvents_generic(Fingerprint* fp, const void* src, size_t srcSize, size_t= samplingRate, unsigned hashLog) +{ + const char* p =3D (const char*)src; + size_t limit =3D srcSize - HASHLENGTH + 1; + size_t n; + assert(srcSize >=3D HASHLENGTH); + for (n =3D 0; n < limit; n+=3DsamplingRate) { + fp->events[hash2(p+n, hashLog)]++; + } + fp->nbEvents +=3D limit/samplingRate; +} + +FORCE_INLINE_TEMPLATE void +recordFingerprint_generic(Fingerprint* fp, const void* src, size_t srcSize= , size_t samplingRate, unsigned hashLog) +{ + ZSTD_memset(fp, 0, sizeof(unsigned) * ((size_t)1 << hashLog)); + fp->nbEvents =3D 0; + addEvents_generic(fp, src, srcSize, samplingRate, hashLog); +} + +typedef void (*RecordEvents_f)(Fingerprint* fp, const void* src, size_t sr= cSize); + +#define FP_RECORD(_rate) ZSTD_recordFingerprint_##_rate + +#define ZSTD_GEN_RECORD_FINGERPRINT(_rate, _hSize) = \ + static void FP_RECORD(_rate)(Fingerprint* fp, const void* src, size_t = srcSize) \ + { = \ + recordFingerprint_generic(fp, src, srcSize, _rate, _hSize); = \ + } + +ZSTD_GEN_RECORD_FINGERPRINT(1, 10) +ZSTD_GEN_RECORD_FINGERPRINT(5, 10) +ZSTD_GEN_RECORD_FINGERPRINT(11, 9) +ZSTD_GEN_RECORD_FINGERPRINT(43, 8) + + +static U64 abs64(S64 s64) { return (U64)((s64 < 0) ? -s64 : s64); } + +static U64 fpDistance(const Fingerprint* fp1, const Fingerprint* fp2, unsi= gned hashLog) +{ + U64 distance =3D 0; + size_t n; + assert(hashLog <=3D HASHLOG_MAX); + for (n =3D 0; n < ((size_t)1 << hashLog); n++) { + distance +=3D + abs64((S64)fp1->events[n] * (S64)fp2->nbEvents - (S64)fp2->eve= nts[n] * (S64)fp1->nbEvents); + } + return distance; +} + +/* Compare newEvents with pastEvents + * return 1 when considered "too different" + */ +static int compareFingerprints(const Fingerprint* ref, + const Fingerprint* newfp, + int penalty, + unsigned hashLog) +{ + assert(ref->nbEvents > 0); + assert(newfp->nbEvents > 0); + { U64 p50 =3D (U64)ref->nbEvents * (U64)newfp->nbEvents; + U64 deviation =3D fpDistance(ref, newfp, hashLog); + U64 threshold =3D p50 * (U64)(THRESHOLD_BASE + penalty) / THRESHOL= D_PENALTY_RATE; + return deviation >=3D threshold; + } +} + +static void mergeEvents(Fingerprint* acc, const Fingerprint* newfp) +{ + size_t n; + for (n =3D 0; n < HASHTABLESIZE; n++) { + acc->events[n] +=3D newfp->events[n]; + } + acc->nbEvents +=3D newfp->nbEvents; +} + +static void flushEvents(FPStats* fpstats) +{ + size_t n; + for (n =3D 0; n < HASHTABLESIZE; n++) { + fpstats->pastEvents.events[n] =3D fpstats->newEvents.events[n]; + } + fpstats->pastEvents.nbEvents =3D fpstats->newEvents.nbEvents; + ZSTD_memset(&fpstats->newEvents, 0, sizeof(fpstats->newEvents)); +} + +static void removeEvents(Fingerprint* acc, const Fingerprint* slice) +{ + size_t n; + for (n =3D 0; n < HASHTABLESIZE; n++) { + assert(acc->events[n] >=3D slice->events[n]); + acc->events[n] -=3D slice->events[n]; + } + acc->nbEvents -=3D slice->nbEvents; +} + +#define CHUNKSIZE (8 << 10) +static size_t ZSTD_splitBlock_byChunks(const void* blockStart, size_t bloc= kSize, + int level, + void* workspace, size_t wkspSize) +{ + static const RecordEvents_f records_fs[] =3D { + FP_RECORD(43), FP_RECORD(11), FP_RECORD(5), FP_RECORD(1) + }; + static const unsigned hashParams[] =3D { 8, 9, 10, 10 }; + const RecordEvents_f record_f =3D (assert(0<=3Dlevel && level<=3D3), r= ecords_fs[level]); + FPStats* const fpstats =3D (FPStats*)workspace; + const char* p =3D (const char*)blockStart; + int penalty =3D THRESHOLD_PENALTY; + size_t pos =3D 0; + assert(blockSize =3D=3D (128 << 10)); + assert(workspace !=3D NULL); + assert((size_t)workspace % ZSTD_ALIGNOF(FPStats) =3D=3D 0); + ZSTD_STATIC_ASSERT(ZSTD_SLIPBLOCK_WORKSPACESIZE >=3D sizeof(FPStats)); + assert(wkspSize >=3D sizeof(FPStats)); (void)wkspSize; + + initStats(fpstats); + record_f(&fpstats->pastEvents, p, CHUNKSIZE); + for (pos =3D CHUNKSIZE; pos <=3D blockSize - CHUNKSIZE; pos +=3D CHUNK= SIZE) { + record_f(&fpstats->newEvents, p + pos, CHUNKSIZE); + if (compareFingerprints(&fpstats->pastEvents, &fpstats->newEvents,= penalty, hashParams[level])) { + return pos; + } else { + mergeEvents(&fpstats->pastEvents, &fpstats->newEvents); + if (penalty > 0) penalty--; + } + } + assert(pos =3D=3D blockSize); + return blockSize; + (void)flushEvents; (void)removeEvents; +} + +/* ZSTD_splitBlock_fromBorders(): very fast strategy : + * compare fingerprint from beginning and end of the block, + * derive from their difference if it's preferable to split in the middle, + * repeat the process a second time, for finer grained decision. + * 3 times did not brought improvements, so I stopped at 2. + * Benefits are good enough for a cheap heuristic. + * More accurate splitting saves more, but speed impact is also more perce= ptible. + * For better accuracy, use more elaborate variant *_byChunks. + */ +static size_t ZSTD_splitBlock_fromBorders(const void* blockStart, size_t b= lockSize, + void* workspace, size_t wkspSize) +{ +#define SEGMENT_SIZE 512 + FPStats* const fpstats =3D (FPStats*)workspace; + Fingerprint* middleEvents =3D (Fingerprint*)(void*)((char*)workspace += 512 * sizeof(unsigned)); + assert(blockSize =3D=3D (128 << 10)); + assert(workspace !=3D NULL); + assert((size_t)workspace % ZSTD_ALIGNOF(FPStats) =3D=3D 0); + ZSTD_STATIC_ASSERT(ZSTD_SLIPBLOCK_WORKSPACESIZE >=3D sizeof(FPStats)); + assert(wkspSize >=3D sizeof(FPStats)); (void)wkspSize; + + initStats(fpstats); + HIST_add(fpstats->pastEvents.events, blockStart, SEGMENT_SIZE); + HIST_add(fpstats->newEvents.events, (const char*)blockStart + blockSiz= e - SEGMENT_SIZE, SEGMENT_SIZE); + fpstats->pastEvents.nbEvents =3D fpstats->newEvents.nbEvents =3D SEGME= NT_SIZE; + if (!compareFingerprints(&fpstats->pastEvents, &fpstats->newEvents, 0,= 8)) + return blockSize; + + HIST_add(middleEvents->events, (const char*)blockStart + blockSize/2 -= SEGMENT_SIZE/2, SEGMENT_SIZE); + middleEvents->nbEvents =3D SEGMENT_SIZE; + { U64 const distFromBegin =3D fpDistance(&fpstats->pastEvents, middl= eEvents, 8); + U64 const distFromEnd =3D fpDistance(&fpstats->newEvents, middleEv= ents, 8); + U64 const minDistance =3D SEGMENT_SIZE * SEGMENT_SIZE / 3; + if (abs64((S64)distFromBegin - (S64)distFromEnd) < minDistance) + return 64 KB; + return (distFromBegin > distFromEnd) ? 32 KB : 96 KB; + } +} + +size_t ZSTD_splitBlock(const void* blockStart, size_t blockSize, + int level, + void* workspace, size_t wkspSize) +{ + DEBUGLOG(6, "ZSTD_splitBlock (level=3D%i)", level); + assert(0<=3Dlevel && level<=3D4); + if (level =3D=3D 0) + return ZSTD_splitBlock_fromBorders(blockStart, blockSize, workspac= e, wkspSize); + /* level >=3D 1*/ + return ZSTD_splitBlock_byChunks(blockStart, blockSize, level-1, worksp= ace, wkspSize); +} diff --git a/lib/zstd/compress/zstd_preSplit.h b/lib/zstd/compress/zstd_pre= Split.h new file mode 100644 index 000000000000..f98f797fe191 --- /dev/null +++ b/lib/zstd/compress/zstd_preSplit.h @@ -0,0 +1,34 @@ +/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */ +/* + * Copyright (c) Meta Platforms, Inc. and affiliates. + * All rights reserved. + * + * This source code is licensed under both the BSD-style license (found in= the + * LICENSE file in the root directory of this source tree) and the GPLv2 (= found + * in the COPYING file in the root directory of this source tree). + * You may select, at your option, one of the above-listed licenses. + */ + +#ifndef ZSTD_PRESPLIT_H +#define ZSTD_PRESPLIT_H + +#include /* size_t */ + +#define ZSTD_SLIPBLOCK_WORKSPACESIZE 8208 + +/* ZSTD_splitBlock(): + * @level must be a value between 0 and 4. + * higher levels spend more energy to detect block boundaries. + * @workspace must be aligned for size_t. + * @wkspSize must be at least >=3D ZSTD_SLIPBLOCK_WORKSPACESIZE + * note: + * For the time being, this function only accepts full 128 KB blocks. + * Therefore, @blockSize must be =3D=3D 128 KB. + * While this could be extended to smaller sizes in the future, + * it is not yet clear if this would be useful. TBD. + */ +size_t ZSTD_splitBlock(const void* blockStart, size_t blockSize, + int level, + void* workspace, size_t wkspSize); + +#endif /* ZSTD_PRESPLIT_H */ diff --git a/lib/zstd/decompress/huf_decompress.c b/lib/zstd/decompress/huf= _decompress.c index 60958afebc41..ac8b87f48f84 100644 --- a/lib/zstd/decompress/huf_decompress.c +++ b/lib/zstd/decompress/huf_decompress.c @@ -1,7 +1,8 @@ +// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause /* ****************************************************************** * huff0 huffman decoder, * part of Finite State Entropy library - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * * You can contact the author at : * - FSE+HUF source repository : https://github.com/Cyan4973/FiniteStateE= ntropy @@ -19,10 +20,10 @@ #include "../common/compiler.h" #include "../common/bitstream.h" /* BIT_* */ #include "../common/fse.h" /* to compress headers */ -#define HUF_STATIC_LINKING_ONLY #include "../common/huf.h" #include "../common/error_private.h" #include "../common/zstd_internal.h" +#include "../common/bits.h" /* ZSTD_highbit32, ZSTD_countTrailingZer= os64 */ =20 /* ************************************************************** * Constants @@ -34,6 +35,12 @@ * Macros ****************************************************************/ =20 +#ifdef HUF_DISABLE_FAST_DECODE +# define HUF_ENABLE_FAST_DECODE 0 +#else +# define HUF_ENABLE_FAST_DECODE 1 +#endif + /* These two optional macros force the use one way or another of the two * Huffman decompression implementations. You can't force in both directio= ns * at the same time. @@ -43,27 +50,25 @@ #error "Cannot force the use of the X1 and X2 decoders at the same time!" #endif =20 -#if ZSTD_ENABLE_ASM_X86_64_BMI2 && DYNAMIC_BMI2 -# define HUF_ASM_X86_64_BMI2_ATTRS BMI2_TARGET_ATTRIBUTE +/* When DYNAMIC_BMI2 is enabled, fast decoders are only called when bmi2 is + * supported at runtime, so we can add the BMI2 target attribute. + * When it is disabled, we will still get BMI2 if it is enabled statically. + */ +#if DYNAMIC_BMI2 +# define HUF_FAST_BMI2_ATTRS BMI2_TARGET_ATTRIBUTE #else -# define HUF_ASM_X86_64_BMI2_ATTRS +# define HUF_FAST_BMI2_ATTRS #endif =20 #define HUF_EXTERN_C #define HUF_ASM_DECL HUF_EXTERN_C =20 -#if DYNAMIC_BMI2 || (ZSTD_ENABLE_ASM_X86_64_BMI2 && defined(__BMI2__)) +#if DYNAMIC_BMI2 # define HUF_NEED_BMI2_FUNCTION 1 #else # define HUF_NEED_BMI2_FUNCTION 0 #endif =20 -#if !(ZSTD_ENABLE_ASM_X86_64_BMI2 && defined(__BMI2__)) -# define HUF_NEED_DEFAULT_FUNCTION 1 -#else -# define HUF_NEED_DEFAULT_FUNCTION 0 -#endif - /* ************************************************************** * Error Management ****************************************************************/ @@ -80,6 +85,11 @@ /* ************************************************************** * BMI2 Variant Wrappers ****************************************************************/ +typedef size_t (*HUF_DecompressUsingDTableFn)(void *dst, size_t dstSize, + const void *cSrc, + size_t cSrcSize, + const HUF_DTable *DTable); + #if DYNAMIC_BMI2 =20 #define HUF_DGEN(fn) = \ @@ -101,9 +111,9 @@ } = \ = \ static size_t fn(void* dst, size_t dstSize, void const* cSrc, = \ - size_t cSrcSize, HUF_DTable const* DTable, int bmi2) = \ + size_t cSrcSize, HUF_DTable const* DTable, int flags)= \ { = \ - if (bmi2) { = \ + if (flags & HUF_flags_bmi2) { = \ return fn##_bmi2(dst, dstSize, cSrc, cSrcSize, DTable); = \ } = \ return fn##_default(dst, dstSize, cSrc, cSrcSize, DTable); = \ @@ -113,9 +123,9 @@ =20 #define HUF_DGEN(fn) = \ static size_t fn(void* dst, size_t dstSize, void const* cSrc, = \ - size_t cSrcSize, HUF_DTable const* DTable, int bmi2) = \ + size_t cSrcSize, HUF_DTable const* DTable, int flags)= \ { = \ - (void)bmi2; = \ + (void)flags; = \ return fn##_body(dst, dstSize, cSrc, cSrcSize, DTable); = \ } =20 @@ -134,43 +144,66 @@ static DTableDesc HUF_getDTableDesc(const HUF_DTable*= table) return dtd; } =20 -#if ZSTD_ENABLE_ASM_X86_64_BMI2 - -static size_t HUF_initDStream(BYTE const* ip) { +static size_t HUF_initFastDStream(BYTE const* ip) { BYTE const lastByte =3D ip[7]; - size_t const bitsConsumed =3D lastByte ? 8 - BIT_highbit32(lastByte) := 0; + size_t const bitsConsumed =3D lastByte ? 8 - ZSTD_highbit32(lastByte) = : 0; size_t const value =3D MEM_readLEST(ip) | 1; assert(bitsConsumed <=3D 8); + assert(sizeof(size_t) =3D=3D 8); return value << bitsConsumed; } + + +/* + * The input/output arguments to the Huffman fast decoding loop: + * + * ip [in/out] - The input pointers, must be updated to reflect what is co= nsumed. + * op [in/out] - The output pointers, must be updated to reflect what is w= ritten. + * bits [in/out] - The bitstream containers, must be updated to reflect th= e current state. + * dt [in] - The decoding table. + * ilowest [in] - The beginning of the valid range of the input. Decoders = may read + * down to this pointer. It may be below iend[0]. + * oend [in] - The end of the output stream. op[3] must not cross oend. + * iend [in] - The end of each input stream. ip[i] may cross iend[i], + * as long as it is above ilowest, but that indicates corrupti= on. + */ typedef struct { BYTE const* ip[4]; BYTE* op[4]; U64 bits[4]; void const* dt; - BYTE const* ilimit; + BYTE const* ilowest; BYTE* oend; BYTE const* iend[4]; -} HUF_DecompressAsmArgs; +} HUF_DecompressFastArgs; + +typedef void (*HUF_DecompressFastLoopFn)(HUF_DecompressFastArgs*); =20 /* - * Initializes args for the asm decoding loop. - * @returns 0 on success - * 1 if the fallback implementation should be used. + * Initializes args for the fast decoding loop. + * @returns 1 on success + * 0 if the fallback implementation should be used. * Or an error code on failure. */ -static size_t HUF_DecompressAsmArgs_init(HUF_DecompressAsmArgs* args, void= * dst, size_t dstSize, void const* src, size_t srcSize, const HUF_DTable* D= Table) +static size_t HUF_DecompressFastArgs_init(HUF_DecompressFastArgs* args, vo= id* dst, size_t dstSize, void const* src, size_t srcSize, const HUF_DTable*= DTable) { void const* dt =3D DTable + 1; U32 const dtLog =3D HUF_getDTableDesc(DTable).tableLog; =20 - const BYTE* const ilimit =3D (const BYTE*)src + 6 + 8; + const BYTE* const istart =3D (const BYTE*)src; =20 - BYTE* const oend =3D (BYTE*)dst + dstSize; + BYTE* const oend =3D ZSTD_maybeNullPtrAdd((BYTE*)dst, dstSize); =20 - /* The following condition is false on x32 platform, - * but HUF_asm is not compatible with this ABI */ - if (!(MEM_isLittleEndian() && !MEM_32bits())) return 1; + /* The fast decoding loop assumes 64-bit little-endian. + * This condition is false on x32. + */ + if (!MEM_isLittleEndian() || MEM_32bits()) + return 0; + + /* Avoid nullptr addition */ + if (dstSize =3D=3D 0) + return 0; + assert(dst !=3D NULL); =20 /* strict minimum : jump table + 1 byte per stream */ if (srcSize < 10) @@ -181,11 +214,10 @@ static size_t HUF_DecompressAsmArgs_init(HUF_Decompre= ssAsmArgs* args, void* dst, * On small inputs we don't have enough data to trigger the fast loop,= so use the old decoder. */ if (dtLog !=3D HUF_DECODER_FAST_TABLELOG) - return 1; + return 0; =20 /* Read the jump table. */ { - const BYTE* const istart =3D (const BYTE*)src; size_t const length1 =3D MEM_readLE16(istart); size_t const length2 =3D MEM_readLE16(istart+2); size_t const length3 =3D MEM_readLE16(istart+4); @@ -195,13 +227,11 @@ static size_t HUF_DecompressAsmArgs_init(HUF_Decompre= ssAsmArgs* args, void* dst, args->iend[2] =3D args->iend[1] + length2; args->iend[3] =3D args->iend[2] + length3; =20 - /* HUF_initDStream() requires this, and this small of an input + /* HUF_initFastDStream() requires this, and this small of an input * won't benefit from the ASM loop anyways. - * length1 must be >=3D 16 so that ip[0] >=3D ilimit before the lo= op - * starts. */ - if (length1 < 16 || length2 < 8 || length3 < 8 || length4 < 8) - return 1; + if (length1 < 8 || length2 < 8 || length3 < 8 || length4 < 8) + return 0; if (length4 > srcSize) return ERROR(corruption_detected); /* ove= rflow */ } /* ip[] contains the position that is currently loaded into bits[]. */ @@ -218,7 +248,7 @@ static size_t HUF_DecompressAsmArgs_init(HUF_Decompress= AsmArgs* args, void* dst, =20 /* No point to call the ASM loop for tiny outputs. */ if (args->op[3] >=3D oend) - return 1; + return 0; =20 /* bits[] is the bit container. * It is read from the MSB down to the LSB. @@ -227,24 +257,25 @@ static size_t HUF_DecompressAsmArgs_init(HUF_Decompre= ssAsmArgs* args, void* dst, * set, so that CountTrailingZeros(bits[]) can be used * to count how many bits we've consumed. */ - args->bits[0] =3D HUF_initDStream(args->ip[0]); - args->bits[1] =3D HUF_initDStream(args->ip[1]); - args->bits[2] =3D HUF_initDStream(args->ip[2]); - args->bits[3] =3D HUF_initDStream(args->ip[3]); - - /* If ip[] >=3D ilimit, it is guaranteed to be safe to - * reload bits[]. It may be beyond its section, but is - * guaranteed to be valid (>=3D istart). - */ - args->ilimit =3D ilimit; + args->bits[0] =3D HUF_initFastDStream(args->ip[0]); + args->bits[1] =3D HUF_initFastDStream(args->ip[1]); + args->bits[2] =3D HUF_initFastDStream(args->ip[2]); + args->bits[3] =3D HUF_initFastDStream(args->ip[3]); + + /* The decoders must be sure to never read beyond ilowest. + * This is lower than iend[0], but allowing decoders to read + * down to ilowest can allow an extra iteration or two in the + * fast loop. + */ + args->ilowest =3D istart; =20 args->oend =3D oend; args->dt =3D dt; =20 - return 0; + return 1; } =20 -static size_t HUF_initRemainingDStream(BIT_DStream_t* bit, HUF_DecompressA= smArgs const* args, int stream, BYTE* segmentEnd) +static size_t HUF_initRemainingDStream(BIT_DStream_t* bit, HUF_DecompressF= astArgs const* args, int stream, BYTE* segmentEnd) { /* Validate that we haven't overwritten. */ if (args->op[stream] > segmentEnd) @@ -258,15 +289,33 @@ static size_t HUF_initRemainingDStream(BIT_DStream_t*= bit, HUF_DecompressAsmArgs return ERROR(corruption_detected); =20 /* Construct the BIT_DStream_t. */ - bit->bitContainer =3D MEM_readLE64(args->ip[stream]); - bit->bitsConsumed =3D ZSTD_countTrailingZeros((size_t)args->bits[strea= m]); - bit->start =3D (const char*)args->iend[0]; + assert(sizeof(size_t) =3D=3D 8); + bit->bitContainer =3D MEM_readLEST(args->ip[stream]); + bit->bitsConsumed =3D ZSTD_countTrailingZeros64(args->bits[stream]); + bit->start =3D (const char*)args->ilowest; bit->limitPtr =3D bit->start + sizeof(size_t); bit->ptr =3D (const char*)args->ip[stream]; =20 return 0; } -#endif + +/* Calls X(N) for each stream 0, 1, 2, 3. */ +#define HUF_4X_FOR_EACH_STREAM(X) \ + do { \ + X(0); \ + X(1); \ + X(2); \ + X(3); \ + } while (0) + +/* Calls X(N, var) for each stream 0, 1, 2, 3. */ +#define HUF_4X_FOR_EACH_STREAM_WITH_VAR(X, var) \ + do { \ + X(0, (var)); \ + X(1, (var)); \ + X(2, (var)); \ + X(3, (var)); \ + } while (0) =20 =20 #ifndef HUF_FORCE_DECOMPRESS_X2 @@ -283,10 +332,11 @@ typedef struct { BYTE nbBits; BYTE byte; } HUF_DEltX1= ; /* single-symbol decodi static U64 HUF_DEltX1_set4(BYTE symbol, BYTE nbBits) { U64 D4; if (MEM_isLittleEndian()) { - D4 =3D (symbol << 8) + nbBits; + D4 =3D (U64)((symbol << 8) + nbBits); } else { - D4 =3D symbol + (nbBits << 8); + D4 =3D (U64)(symbol + (nbBits << 8)); } + assert(D4 < (1U << 16)); D4 *=3D 0x0001000100010001ULL; return D4; } @@ -329,13 +379,7 @@ typedef struct { BYTE huffWeight[HUF_SYMBOLVALUE_MAX + 1]; } HUF_ReadDTableX1_Workspace; =20 - -size_t HUF_readDTableX1_wksp(HUF_DTable* DTable, const void* src, size_t s= rcSize, void* workSpace, size_t wkspSize) -{ - return HUF_readDTableX1_wksp_bmi2(DTable, src, srcSize, workSpace, wks= pSize, /* bmi2 */ 0); -} - -size_t HUF_readDTableX1_wksp_bmi2(HUF_DTable* DTable, const void* src, siz= e_t srcSize, void* workSpace, size_t wkspSize, int bmi2) +size_t HUF_readDTableX1_wksp(HUF_DTable* DTable, const void* src, size_t s= rcSize, void* workSpace, size_t wkspSize, int flags) { U32 tableLog =3D 0; U32 nbSymbols =3D 0; @@ -350,7 +394,7 @@ size_t HUF_readDTableX1_wksp_bmi2(HUF_DTable* DTable, c= onst void* src, size_t sr DEBUG_STATIC_ASSERT(sizeof(DTableDesc) =3D=3D sizeof(HUF_DTable)); /* ZSTD_memset(huffWeight, 0, sizeof(huffWeight)); */ /* is not nece= ssary, even though some analyzer complain ... */ =20 - iSize =3D HUF_readStats_wksp(wksp->huffWeight, HUF_SYMBOLVALUE_MAX + 1= , wksp->rankVal, &nbSymbols, &tableLog, src, srcSize, wksp->statsWksp, size= of(wksp->statsWksp), bmi2); + iSize =3D HUF_readStats_wksp(wksp->huffWeight, HUF_SYMBOLVALUE_MAX + 1= , wksp->rankVal, &nbSymbols, &tableLog, src, srcSize, wksp->statsWksp, size= of(wksp->statsWksp), flags); if (HUF_isError(iSize)) return iSize; =20 =20 @@ -377,9 +421,8 @@ size_t HUF_readDTableX1_wksp_bmi2(HUF_DTable* DTable, c= onst void* src, size_t sr * rankStart[0] is not filled because there are no entries in the tabl= e for * weight 0. */ - { - int n; - int nextRankStart =3D 0; + { int n; + U32 nextRankStart =3D 0; int const unroll =3D 4; int const nLimit =3D (int)nbSymbols - unroll + 1; for (n=3D0; n<(int)tableLog+1; n++) { @@ -406,10 +449,9 @@ size_t HUF_readDTableX1_wksp_bmi2(HUF_DTable* DTable, = const void* src, size_t sr * We can switch based on the length to a different inner loop which is * optimized for that particular case. */ - { - U32 w; - int symbol=3Dwksp->rankVal[0]; - int rankStart=3D0; + { U32 w; + int symbol =3D wksp->rankVal[0]; + int rankStart =3D 0; for (w=3D1; wrankVal[w]; int const length =3D (1 << w) >> 1; @@ -483,15 +525,19 @@ HUF_decodeSymbolX1(BIT_DStream_t* Dstream, const HUF_= DEltX1* dt, const U32 dtLog } =20 #define HUF_DECODE_SYMBOLX1_0(ptr, DStreamPtr) \ - *ptr++ =3D HUF_decodeSymbolX1(DStreamPtr, dt, dtLog) + do { *ptr++ =3D HUF_decodeSymbolX1(DStreamPtr, dt, dtLog); } while (0) =20 -#define HUF_DECODE_SYMBOLX1_1(ptr, DStreamPtr) \ - if (MEM_64bits() || (HUF_TABLELOG_MAX<=3D12)) \ - HUF_DECODE_SYMBOLX1_0(ptr, DStreamPtr) +#define HUF_DECODE_SYMBOLX1_1(ptr, DStreamPtr) \ + do { \ + if (MEM_64bits() || (HUF_TABLELOG_MAX<=3D12)) \ + HUF_DECODE_SYMBOLX1_0(ptr, DStreamPtr); \ + } while (0) =20 -#define HUF_DECODE_SYMBOLX1_2(ptr, DStreamPtr) \ - if (MEM_64bits()) \ - HUF_DECODE_SYMBOLX1_0(ptr, DStreamPtr) +#define HUF_DECODE_SYMBOLX1_2(ptr, DStreamPtr) \ + do { \ + if (MEM_64bits()) \ + HUF_DECODE_SYMBOLX1_0(ptr, DStreamPtr); \ + } while (0) =20 HINT_INLINE size_t HUF_decodeStreamX1(BYTE* p, BIT_DStream_t* const bitDPtr, BYTE* const pEnd= , const HUF_DEltX1* const dt, const U32 dtLog) @@ -519,7 +565,7 @@ HUF_decodeStreamX1(BYTE* p, BIT_DStream_t* const bitDPt= r, BYTE* const pEnd, cons while (p < pEnd) HUF_DECODE_SYMBOLX1_0(p, bitDPtr); =20 - return pEnd-pStart; + return (size_t)(pEnd-pStart); } =20 FORCE_INLINE_TEMPLATE size_t @@ -529,7 +575,7 @@ HUF_decompress1X1_usingDTable_internal_body( const HUF_DTable* DTable) { BYTE* op =3D (BYTE*)dst; - BYTE* const oend =3D op + dstSize; + BYTE* const oend =3D ZSTD_maybeNullPtrAdd(op, dstSize); const void* dtPtr =3D DTable + 1; const HUF_DEltX1* const dt =3D (const HUF_DEltX1*)dtPtr; BIT_DStream_t bitD; @@ -545,6 +591,10 @@ HUF_decompress1X1_usingDTable_internal_body( return dstSize; } =20 +/* HUF_decompress4X1_usingDTable_internal_body(): + * Conditions : + * @dstSize >=3D 6 + */ FORCE_INLINE_TEMPLATE size_t HUF_decompress4X1_usingDTable_internal_body( void* dst, size_t dstSize, @@ -553,6 +603,7 @@ HUF_decompress4X1_usingDTable_internal_body( { /* Check */ if (cSrcSize < 10) return ERROR(corruption_detected); /* strict minim= um : jump table + 1 byte per stream */ + if (dstSize < 6) return ERROR(corruption_detected); /* stream = 4-split doesn't work */ =20 { const BYTE* const istart =3D (const BYTE*) cSrc; BYTE* const ostart =3D (BYTE*) dst; @@ -588,6 +639,7 @@ HUF_decompress4X1_usingDTable_internal_body( =20 if (length4 > cSrcSize) return ERROR(corruption_detected); /* ov= erflow */ if (opStart4 > oend) return ERROR(corruption_detected); /* ov= erflow */ + assert(dstSize >=3D 6); /* validated above */ CHECK_F( BIT_initDStream(&bitD1, istart1, length1) ); CHECK_F( BIT_initDStream(&bitD2, istart2, length2) ); CHECK_F( BIT_initDStream(&bitD3, istart3, length3) ); @@ -650,52 +702,173 @@ size_t HUF_decompress4X1_usingDTable_internal_bmi2(v= oid* dst, size_t dstSize, vo } #endif =20 -#if HUF_NEED_DEFAULT_FUNCTION static size_t HUF_decompress4X1_usingDTable_internal_default(void* dst, size_t ds= tSize, void const* cSrc, size_t cSrcSize, HUF_DTable const* DTable) { return HUF_decompress4X1_usingDTable_internal_body(dst, dstSize, cSrc,= cSrcSize, DTable); } -#endif =20 #if ZSTD_ENABLE_ASM_X86_64_BMI2 =20 -HUF_ASM_DECL void HUF_decompress4X1_usingDTable_internal_bmi2_asm_loop(HUF= _DecompressAsmArgs* args) ZSTDLIB_HIDDEN; +HUF_ASM_DECL void HUF_decompress4X1_usingDTable_internal_fast_asm_loop(HUF= _DecompressFastArgs* args) ZSTDLIB_HIDDEN; + +#endif + +static HUF_FAST_BMI2_ATTRS +void HUF_decompress4X1_usingDTable_internal_fast_c_loop(HUF_DecompressFast= Args* args) +{ + U64 bits[4]; + BYTE const* ip[4]; + BYTE* op[4]; + U16 const* const dtable =3D (U16 const*)args->dt; + BYTE* const oend =3D args->oend; + BYTE const* const ilowest =3D args->ilowest; + + /* Copy the arguments to local variables */ + ZSTD_memcpy(&bits, &args->bits, sizeof(bits)); + ZSTD_memcpy((void*)(&ip), &args->ip, sizeof(ip)); + ZSTD_memcpy(&op, &args->op, sizeof(op)); + + assert(MEM_isLittleEndian()); + assert(!MEM_32bits()); + + for (;;) { + BYTE* olimit; + int stream; + + /* Assert loop preconditions */ +#ifndef NDEBUG + for (stream =3D 0; stream < 4; ++stream) { + assert(op[stream] <=3D (stream =3D=3D 3 ? oend : op[stream + 1= ])); + assert(ip[stream] >=3D ilowest); + } +#endif + /* Compute olimit */ + { + /* Each iteration produces 5 output symbols per stream */ + size_t const oiters =3D (size_t)(oend - op[3]) / 5; + /* Each iteration consumes up to 11 bits * 5 =3D 55 bits < 7 b= ytes + * per stream. + */ + size_t const iiters =3D (size_t)(ip[0] - ilowest) / 7; + /* We can safely run iters iterations before running bounds ch= ecks */ + size_t const iters =3D MIN(oiters, iiters); + size_t const symbols =3D iters * 5; + + /* We can simply check that op[3] < olimit, instead of checkin= g all + * of our bounds, since we can't hit the other bounds until we= 've run + * iters iterations, which only happens when op[3] =3D=3D olim= it. + */ + olimit =3D op[3] + symbols; + + /* Exit fast decoding loop once we reach the end. */ + if (op[3] =3D=3D olimit) + break; + + /* Exit the decoding loop if any input pointer has crossed the + * previous one. This indicates corruption, and a precondition + * to our loop is that ip[i] >=3D ip[0]. + */ + for (stream =3D 1; stream < 4; ++stream) { + if (ip[stream] < ip[stream - 1]) + goto _out; + } + } + +#ifndef NDEBUG + for (stream =3D 1; stream < 4; ++stream) { + assert(ip[stream] >=3D ip[stream - 1]); + } +#endif + +#define HUF_4X1_DECODE_SYMBOL(_stream, _symbol) \ + do { \ + int const index =3D (int)(bits[(_stream)] >> 53); \ + int const entry =3D (int)dtable[index]; \ + bits[(_stream)] <<=3D (entry & 0x3F); \ + op[(_stream)][(_symbol)] =3D (BYTE)((entry >> 8) & 0xFF); \ + } while (0) + +#define HUF_4X1_RELOAD_STREAM(_stream) \ + do { \ + int const ctz =3D ZSTD_countTrailingZeros64(bits[(_stream)]); \ + int const nbBits =3D ctz & 7; \ + int const nbBytes =3D ctz >> 3; \ + op[(_stream)] +=3D 5; \ + ip[(_stream)] -=3D nbBytes; \ + bits[(_stream)] =3D MEM_read64(ip[(_stream)]) | 1; \ + bits[(_stream)] <<=3D nbBits; \ + } while (0) + + /* Manually unroll the loop because compilers don't consistently + * unroll the inner loops, which destroys performance. + */ + do { + /* Decode 5 symbols in each of the 4 streams */ + HUF_4X_FOR_EACH_STREAM_WITH_VAR(HUF_4X1_DECODE_SYMBOL, 0); + HUF_4X_FOR_EACH_STREAM_WITH_VAR(HUF_4X1_DECODE_SYMBOL, 1); + HUF_4X_FOR_EACH_STREAM_WITH_VAR(HUF_4X1_DECODE_SYMBOL, 2); + HUF_4X_FOR_EACH_STREAM_WITH_VAR(HUF_4X1_DECODE_SYMBOL, 3); + HUF_4X_FOR_EACH_STREAM_WITH_VAR(HUF_4X1_DECODE_SYMBOL, 4); + + /* Reload each of the 4 the bitstreams */ + HUF_4X_FOR_EACH_STREAM(HUF_4X1_RELOAD_STREAM); + } while (op[3] < olimit); + +#undef HUF_4X1_DECODE_SYMBOL +#undef HUF_4X1_RELOAD_STREAM + } =20 -static HUF_ASM_X86_64_BMI2_ATTRS +_out: + + /* Save the final values of each of the state variables back to args. = */ + ZSTD_memcpy(&args->bits, &bits, sizeof(bits)); + ZSTD_memcpy((void*)(&args->ip), &ip, sizeof(ip)); + ZSTD_memcpy(&args->op, &op, sizeof(op)); +} + +/* + * @returns @p dstSize on success (>=3D 6) + * 0 if the fallback implementation should be used + * An error if an error occurred + */ +static HUF_FAST_BMI2_ATTRS size_t -HUF_decompress4X1_usingDTable_internal_bmi2_asm( +HUF_decompress4X1_usingDTable_internal_fast( void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize, - const HUF_DTable* DTable) + const HUF_DTable* DTable, + HUF_DecompressFastLoopFn loopFn) { void const* dt =3D DTable + 1; - const BYTE* const iend =3D (const BYTE*)cSrc + 6; - BYTE* const oend =3D (BYTE*)dst + dstSize; - HUF_DecompressAsmArgs args; - { - size_t const ret =3D HUF_DecompressAsmArgs_init(&args, dst, dstSiz= e, cSrc, cSrcSize, DTable); - FORWARD_IF_ERROR(ret, "Failed to init asm args"); - if (ret !=3D 0) - return HUF_decompress4X1_usingDTable_internal_bmi2(dst, dstSiz= e, cSrc, cSrcSize, DTable); + BYTE const* const ilowest =3D (BYTE const*)cSrc; + BYTE* const oend =3D ZSTD_maybeNullPtrAdd((BYTE*)dst, dstSize); + HUF_DecompressFastArgs args; + { size_t const ret =3D HUF_DecompressFastArgs_init(&args, dst, dstSi= ze, cSrc, cSrcSize, DTable); + FORWARD_IF_ERROR(ret, "Failed to init fast loop args"); + if (ret =3D=3D 0) + return 0; } =20 - assert(args.ip[0] >=3D args.ilimit); - HUF_decompress4X1_usingDTable_internal_bmi2_asm_loop(&args); + assert(args.ip[0] >=3D args.ilowest); + loopFn(&args); =20 - /* Our loop guarantees that ip[] >=3D ilimit and that we haven't + /* Our loop guarantees that ip[] >=3D ilowest and that we haven't * overwritten any op[]. */ - assert(args.ip[0] >=3D iend); - assert(args.ip[1] >=3D iend); - assert(args.ip[2] >=3D iend); - assert(args.ip[3] >=3D iend); + assert(args.ip[0] >=3D ilowest); + assert(args.ip[0] >=3D ilowest); + assert(args.ip[1] >=3D ilowest); + assert(args.ip[2] >=3D ilowest); + assert(args.ip[3] >=3D ilowest); assert(args.op[3] <=3D oend); - (void)iend; + + assert(ilowest =3D=3D args.ilowest); + assert(ilowest + 6 =3D=3D args.iend[0]); + (void)ilowest; =20 /* finish bit streams one by one. */ - { - size_t const segmentSize =3D (dstSize+3) / 4; + { size_t const segmentSize =3D (dstSize+3) / 4; BYTE* segmentEnd =3D (BYTE*)dst; int i; for (i =3D 0; i < 4; ++i) { @@ -712,97 +885,59 @@ HUF_decompress4X1_usingDTable_internal_bmi2_asm( } =20 /* decoded size */ + assert(dstSize !=3D 0); return dstSize; } -#endif /* ZSTD_ENABLE_ASM_X86_64_BMI2 */ - -typedef size_t (*HUF_decompress_usingDTable_t)(void *dst, size_t dstSize, - const void *cSrc, - size_t cSrcSize, - const HUF_DTable *DTable); =20 HUF_DGEN(HUF_decompress1X1_usingDTable_internal) =20 static size_t HUF_decompress4X1_usingDTable_internal(void* dst, size_t dst= Size, void const* cSrc, - size_t cSrcSize, HUF_DTable const* DTable, int bmi2) + size_t cSrcSize, HUF_DTable const* DTable, int flags) { + HUF_DecompressUsingDTableFn fallbackFn =3D HUF_decompress4X1_usingDTab= le_internal_default; + HUF_DecompressFastLoopFn loopFn =3D HUF_decompress4X1_usingDTable_inte= rnal_fast_c_loop; + #if DYNAMIC_BMI2 - if (bmi2) { + if (flags & HUF_flags_bmi2) { + fallbackFn =3D HUF_decompress4X1_usingDTable_internal_bmi2; # if ZSTD_ENABLE_ASM_X86_64_BMI2 - return HUF_decompress4X1_usingDTable_internal_bmi2_asm(dst, dstSiz= e, cSrc, cSrcSize, DTable); -# else - return HUF_decompress4X1_usingDTable_internal_bmi2(dst, dstSize, c= Src, cSrcSize, DTable); + if (!(flags & HUF_flags_disableAsm)) { + loopFn =3D HUF_decompress4X1_usingDTable_internal_fast_asm_loo= p; + } # endif + } else { + return fallbackFn(dst, dstSize, cSrc, cSrcSize, DTable); } -#else - (void)bmi2; #endif =20 #if ZSTD_ENABLE_ASM_X86_64_BMI2 && defined(__BMI2__) - return HUF_decompress4X1_usingDTable_internal_bmi2_asm(dst, dstSize, c= Src, cSrcSize, DTable); -#else - return HUF_decompress4X1_usingDTable_internal_default(dst, dstSize, cS= rc, cSrcSize, DTable); + if (!(flags & HUF_flags_disableAsm)) { + loopFn =3D HUF_decompress4X1_usingDTable_internal_fast_asm_loop; + } #endif -} - - -size_t HUF_decompress1X1_usingDTable( - void* dst, size_t dstSize, - const void* cSrc, size_t cSrcSize, - const HUF_DTable* DTable) -{ - DTableDesc dtd =3D HUF_getDTableDesc(DTable); - if (dtd.tableType !=3D 0) return ERROR(GENERIC); - return HUF_decompress1X1_usingDTable_internal(dst, dstSize, cSrc, cSrc= Size, DTable, /* bmi2 */ 0); -} =20 -size_t HUF_decompress1X1_DCtx_wksp(HUF_DTable* DCtx, void* dst, size_t dst= Size, - const void* cSrc, size_t cSrcSize, - void* workSpace, size_t wkspSize) -{ - const BYTE* ip =3D (const BYTE*) cSrc; - - size_t const hSize =3D HUF_readDTableX1_wksp(DCtx, cSrc, cSrcSize, wor= kSpace, wkspSize); - if (HUF_isError(hSize)) return hSize; - if (hSize >=3D cSrcSize) return ERROR(srcSize_wrong); - ip +=3D hSize; cSrcSize -=3D hSize; - - return HUF_decompress1X1_usingDTable_internal(dst, dstSize, ip, cSrcSi= ze, DCtx, /* bmi2 */ 0); -} - - -size_t HUF_decompress4X1_usingDTable( - void* dst, size_t dstSize, - const void* cSrc, size_t cSrcSize, - const HUF_DTable* DTable) -{ - DTableDesc dtd =3D HUF_getDTableDesc(DTable); - if (dtd.tableType !=3D 0) return ERROR(GENERIC); - return HUF_decompress4X1_usingDTable_internal(dst, dstSize, cSrc, cSrc= Size, DTable, /* bmi2 */ 0); + if (HUF_ENABLE_FAST_DECODE && !(flags & HUF_flags_disableFast)) { + size_t const ret =3D HUF_decompress4X1_usingDTable_internal_fast(d= st, dstSize, cSrc, cSrcSize, DTable, loopFn); + if (ret !=3D 0) + return ret; + } + return fallbackFn(dst, dstSize, cSrc, cSrcSize, DTable); } =20 -static size_t HUF_decompress4X1_DCtx_wksp_bmi2(HUF_DTable* dctx, void* dst= , size_t dstSize, +static size_t HUF_decompress4X1_DCtx_wksp(HUF_DTable* dctx, void* dst, siz= e_t dstSize, const void* cSrc, size_t cSrcSize, - void* workSpace, size_t wkspSize, int b= mi2) + void* workSpace, size_t wkspSize, int f= lags) { const BYTE* ip =3D (const BYTE*) cSrc; =20 - size_t const hSize =3D HUF_readDTableX1_wksp_bmi2(dctx, cSrc, cSrcSize= , workSpace, wkspSize, bmi2); + size_t const hSize =3D HUF_readDTableX1_wksp(dctx, cSrc, cSrcSize, wor= kSpace, wkspSize, flags); if (HUF_isError(hSize)) return hSize; if (hSize >=3D cSrcSize) return ERROR(srcSize_wrong); ip +=3D hSize; cSrcSize -=3D hSize; =20 - return HUF_decompress4X1_usingDTable_internal(dst, dstSize, ip, cSrcSi= ze, dctx, bmi2); -} - -size_t HUF_decompress4X1_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dst= Size, - const void* cSrc, size_t cSrcSize, - void* workSpace, size_t wkspSize) -{ - return HUF_decompress4X1_DCtx_wksp_bmi2(dctx, dst, dstSize, cSrc, cSrc= Size, workSpace, wkspSize, 0); + return HUF_decompress4X1_usingDTable_internal(dst, dstSize, ip, cSrcSi= ze, dctx, flags); } =20 - #endif /* HUF_FORCE_DECOMPRESS_X2 */ =20 =20 @@ -985,7 +1120,7 @@ static void HUF_fillDTableX2Level2(HUF_DEltX2* DTable,= U32 targetLog, const U32 =20 static void HUF_fillDTableX2(HUF_DEltX2* DTable, const U32 targetLog, const sortedSymbol_t* sortedList, - const U32* rankStart, rankValCol_t *rankValOrig= in, const U32 maxWeight, + const U32* rankStart, rankValCol_t* rankValOrig= in, const U32 maxWeight, const U32 nbBitsBaseline) { U32* const rankVal =3D rankValOrigin[0]; @@ -1040,14 +1175,7 @@ typedef struct { =20 size_t HUF_readDTableX2_wksp(HUF_DTable* DTable, const void* src, size_t srcSize, - void* workSpace, size_t wkspSize) -{ - return HUF_readDTableX2_wksp_bmi2(DTable, src, srcSize, workSpace, wks= pSize, /* bmi2 */ 0); -} - -size_t HUF_readDTableX2_wksp_bmi2(HUF_DTable* DTable, - const void* src, size_t srcSize, - void* workSpace, size_t wkspSize, int bmi2) + void* workSpace, size_t wkspSize, int flags) { U32 tableLog, maxW, nbSymbols; DTableDesc dtd =3D HUF_getDTableDesc(DTable); @@ -1069,7 +1197,7 @@ size_t HUF_readDTableX2_wksp_bmi2(HUF_DTable* DTable, if (maxTableLog > HUF_TABLELOG_MAX) return ERROR(tableLog_tooLarge); /* ZSTD_memset(weightList, 0, sizeof(weightList)); */ /* is not neces= sary, even though some analyzer complain ... */ =20 - iSize =3D HUF_readStats_wksp(wksp->weightList, HUF_SYMBOLVALUE_MAX + 1= , wksp->rankStats, &nbSymbols, &tableLog, src, srcSize, wksp->calleeWksp, s= izeof(wksp->calleeWksp), bmi2); + iSize =3D HUF_readStats_wksp(wksp->weightList, HUF_SYMBOLVALUE_MAX + 1= , wksp->rankStats, &nbSymbols, &tableLog, src, srcSize, wksp->calleeWksp, s= izeof(wksp->calleeWksp), flags); if (HUF_isError(iSize)) return iSize; =20 /* check result */ @@ -1159,15 +1287,19 @@ HUF_decodeLastSymbolX2(void* op, BIT_DStream_t* DSt= ream, const HUF_DEltX2* dt, c } =20 #define HUF_DECODE_SYMBOLX2_0(ptr, DStreamPtr) \ - ptr +=3D HUF_decodeSymbolX2(ptr, DStreamPtr, dt, dtLog) + do { ptr +=3D HUF_decodeSymbolX2(ptr, DStreamPtr, dt, dtLog); } while = (0) =20 -#define HUF_DECODE_SYMBOLX2_1(ptr, DStreamPtr) \ - if (MEM_64bits() || (HUF_TABLELOG_MAX<=3D12)) \ - ptr +=3D HUF_decodeSymbolX2(ptr, DStreamPtr, dt, dtLog) +#define HUF_DECODE_SYMBOLX2_1(ptr, DStreamPtr) \ + do { \ + if (MEM_64bits() || (HUF_TABLELOG_MAX<=3D12)) \ + ptr +=3D HUF_decodeSymbolX2(ptr, DStreamPtr, dt, dtLog); \ + } while (0) =20 -#define HUF_DECODE_SYMBOLX2_2(ptr, DStreamPtr) \ - if (MEM_64bits()) \ - ptr +=3D HUF_decodeSymbolX2(ptr, DStreamPtr, dt, dtLog) +#define HUF_DECODE_SYMBOLX2_2(ptr, DStreamPtr) \ + do { \ + if (MEM_64bits()) \ + ptr +=3D HUF_decodeSymbolX2(ptr, DStreamPtr, dt, dtLog); \ + } while (0) =20 HINT_INLINE size_t HUF_decodeStreamX2(BYTE* p, BIT_DStream_t* bitDPtr, BYTE* const pEnd, @@ -1227,7 +1359,7 @@ HUF_decompress1X2_usingDTable_internal_body( =20 /* decode */ { BYTE* const ostart =3D (BYTE*) dst; - BYTE* const oend =3D ostart + dstSize; + BYTE* const oend =3D ZSTD_maybeNullPtrAdd(ostart, dstSize); const void* const dtPtr =3D DTable+1; /* force compiler to not u= se strict-aliasing */ const HUF_DEltX2* const dt =3D (const HUF_DEltX2*)dtPtr; DTableDesc const dtd =3D HUF_getDTableDesc(DTable); @@ -1240,6 +1372,11 @@ HUF_decompress1X2_usingDTable_internal_body( /* decoded size */ return dstSize; } + +/* HUF_decompress4X2_usingDTable_internal_body(): + * Conditions: + * @dstSize >=3D 6 + */ FORCE_INLINE_TEMPLATE size_t HUF_decompress4X2_usingDTable_internal_body( void* dst, size_t dstSize, @@ -1247,6 +1384,7 @@ HUF_decompress4X2_usingDTable_internal_body( const HUF_DTable* DTable) { if (cSrcSize < 10) return ERROR(corruption_detected); /* strict mini= mum : jump table + 1 byte per stream */ + if (dstSize < 6) return ERROR(corruption_detected); /* stream = 4-split doesn't work */ =20 { const BYTE* const istart =3D (const BYTE*) cSrc; BYTE* const ostart =3D (BYTE*) dst; @@ -1280,8 +1418,9 @@ HUF_decompress4X2_usingDTable_internal_body( DTableDesc const dtd =3D HUF_getDTableDesc(DTable); U32 const dtLog =3D dtd.tableLog; =20 - if (length4 > cSrcSize) return ERROR(corruption_detected); /* ov= erflow */ - if (opStart4 > oend) return ERROR(corruption_detected); /* ov= erflow */ + if (length4 > cSrcSize) return ERROR(corruption_detected); /* ove= rflow */ + if (opStart4 > oend) return ERROR(corruption_detected); /* ove= rflow */ + assert(dstSize >=3D 6 /* validated above */); CHECK_F( BIT_initDStream(&bitD1, istart1, length1) ); CHECK_F( BIT_initDStream(&bitD2, istart2, length2) ); CHECK_F( BIT_initDStream(&bitD3, istart3, length3) ); @@ -1366,44 +1505,191 @@ size_t HUF_decompress4X2_usingDTable_internal_bmi2= (void* dst, size_t dstSize, vo } #endif =20 -#if HUF_NEED_DEFAULT_FUNCTION static size_t HUF_decompress4X2_usingDTable_internal_default(void* dst, size_t ds= tSize, void const* cSrc, size_t cSrcSize, HUF_DTable const* DTable) { return HUF_decompress4X2_usingDTable_internal_body(dst, dstSize, cSrc,= cSrcSize, DTable); } -#endif =20 #if ZSTD_ENABLE_ASM_X86_64_BMI2 =20 -HUF_ASM_DECL void HUF_decompress4X2_usingDTable_internal_bmi2_asm_loop(HUF= _DecompressAsmArgs* args) ZSTDLIB_HIDDEN; +HUF_ASM_DECL void HUF_decompress4X2_usingDTable_internal_fast_asm_loop(HUF= _DecompressFastArgs* args) ZSTDLIB_HIDDEN; + +#endif + +static HUF_FAST_BMI2_ATTRS +void HUF_decompress4X2_usingDTable_internal_fast_c_loop(HUF_DecompressFast= Args* args) +{ + U64 bits[4]; + BYTE const* ip[4]; + BYTE* op[4]; + BYTE* oend[4]; + HUF_DEltX2 const* const dtable =3D (HUF_DEltX2 const*)args->dt; + BYTE const* const ilowest =3D args->ilowest; + + /* Copy the arguments to local registers. */ + ZSTD_memcpy(&bits, &args->bits, sizeof(bits)); + ZSTD_memcpy((void*)(&ip), &args->ip, sizeof(ip)); + ZSTD_memcpy(&op, &args->op, sizeof(op)); + + oend[0] =3D op[1]; + oend[1] =3D op[2]; + oend[2] =3D op[3]; + oend[3] =3D args->oend; + + assert(MEM_isLittleEndian()); + assert(!MEM_32bits()); + + for (;;) { + BYTE* olimit; + int stream; + + /* Assert loop preconditions */ +#ifndef NDEBUG + for (stream =3D 0; stream < 4; ++stream) { + assert(op[stream] <=3D oend[stream]); + assert(ip[stream] >=3D ilowest); + } +#endif + /* Compute olimit */ + { + /* Each loop does 5 table lookups for each of the 4 streams. + * Each table lookup consumes up to 11 bits of input, and prod= uces + * up to 2 bytes of output. + */ + /* We can consume up to 7 bytes of input per iteration per str= eam. + * We also know that each input pointer is >=3D ip[0]. So we c= an run + * iters loops before running out of input. + */ + size_t iters =3D (size_t)(ip[0] - ilowest) / 7; + /* Each iteration can produce up to 10 bytes of output per str= eam. + * Each output stream my advance at different rates. So take t= he + * minimum number of safe iterations among all the output stre= ams. + */ + for (stream =3D 0; stream < 4; ++stream) { + size_t const oiters =3D (size_t)(oend[stream] - op[stream]= ) / 10; + iters =3D MIN(iters, oiters); + } + + /* Each iteration produces at least 5 output symbols. So until + * op[3] crosses olimit, we know we haven't executed iters + * iterations yet. This saves us maintaining an iters counter, + * at the expense of computing the remaining # of iterations + * more frequently. + */ + olimit =3D op[3] + (iters * 5); + + /* Exit the fast decoding loop once we reach the end. */ + if (op[3] =3D=3D olimit) + break; + + /* Exit the decoding loop if any input pointer has crossed the + * previous one. This indicates corruption, and a precondition + * to our loop is that ip[i] >=3D ip[0]. + */ + for (stream =3D 1; stream < 4; ++stream) { + if (ip[stream] < ip[stream - 1]) + goto _out; + } + } + +#ifndef NDEBUG + for (stream =3D 1; stream < 4; ++stream) { + assert(ip[stream] >=3D ip[stream - 1]); + } +#endif =20 -static HUF_ASM_X86_64_BMI2_ATTRS size_t -HUF_decompress4X2_usingDTable_internal_bmi2_asm( +#define HUF_4X2_DECODE_SYMBOL(_stream, _decode3) \ + do { \ + if ((_decode3) || (_stream) !=3D 3) { \ + int const index =3D (int)(bits[(_stream)] >> 53); \ + HUF_DEltX2 const entry =3D dtable[index]; \ + MEM_write16(op[(_stream)], entry.sequence); \ + bits[(_stream)] <<=3D (entry.nbBits) & 0x3F; \ + op[(_stream)] +=3D (entry.length); \ + } \ + } while (0) + +#define HUF_4X2_RELOAD_STREAM(_stream) \ + do { \ + HUF_4X2_DECODE_SYMBOL(3, 1); \ + { \ + int const ctz =3D ZSTD_countTrailingZeros64(bits[(_stream)]); \ + int const nbBits =3D ctz & 7; \ + int const nbBytes =3D ctz >> 3; \ + ip[(_stream)] -=3D nbBytes; \ + bits[(_stream)] =3D MEM_read64(ip[(_stream)]) | 1; \ + bits[(_stream)] <<=3D nbBits; \ + } \ + } while (0) + + /* Manually unroll the loop because compilers don't consistently + * unroll the inner loops, which destroys performance. + */ + do { + /* Decode 5 symbols from each of the first 3 streams. + * The final stream will be decoded during the reload phase + * to reduce register pressure. + */ + HUF_4X_FOR_EACH_STREAM_WITH_VAR(HUF_4X2_DECODE_SYMBOL, 0); + HUF_4X_FOR_EACH_STREAM_WITH_VAR(HUF_4X2_DECODE_SYMBOL, 0); + HUF_4X_FOR_EACH_STREAM_WITH_VAR(HUF_4X2_DECODE_SYMBOL, 0); + HUF_4X_FOR_EACH_STREAM_WITH_VAR(HUF_4X2_DECODE_SYMBOL, 0); + HUF_4X_FOR_EACH_STREAM_WITH_VAR(HUF_4X2_DECODE_SYMBOL, 0); + + /* Decode one symbol from the final stream */ + HUF_4X2_DECODE_SYMBOL(3, 1); + + /* Decode 4 symbols from the final stream & reload bitstreams. + * The final stream is reloaded last, meaning that all 5 symbo= ls + * are decoded from the final stream before it is reloaded. + */ + HUF_4X_FOR_EACH_STREAM(HUF_4X2_RELOAD_STREAM); + } while (op[3] < olimit); + } + +#undef HUF_4X2_DECODE_SYMBOL +#undef HUF_4X2_RELOAD_STREAM + +_out: + + /* Save the final values of each of the state variables back to args. = */ + ZSTD_memcpy(&args->bits, &bits, sizeof(bits)); + ZSTD_memcpy((void*)(&args->ip), &ip, sizeof(ip)); + ZSTD_memcpy(&args->op, &op, sizeof(op)); +} + + +static HUF_FAST_BMI2_ATTRS size_t +HUF_decompress4X2_usingDTable_internal_fast( void* dst, size_t dstSize, const void* cSrc, size_t cSrcSize, - const HUF_DTable* DTable) { + const HUF_DTable* DTable, + HUF_DecompressFastLoopFn loopFn) { void const* dt =3D DTable + 1; - const BYTE* const iend =3D (const BYTE*)cSrc + 6; - BYTE* const oend =3D (BYTE*)dst + dstSize; - HUF_DecompressAsmArgs args; + const BYTE* const ilowest =3D (const BYTE*)cSrc; + BYTE* const oend =3D ZSTD_maybeNullPtrAdd((BYTE*)dst, dstSize); + HUF_DecompressFastArgs args; { - size_t const ret =3D HUF_DecompressAsmArgs_init(&args, dst, dstSiz= e, cSrc, cSrcSize, DTable); + size_t const ret =3D HUF_DecompressFastArgs_init(&args, dst, dstSi= ze, cSrc, cSrcSize, DTable); FORWARD_IF_ERROR(ret, "Failed to init asm args"); - if (ret !=3D 0) - return HUF_decompress4X2_usingDTable_internal_bmi2(dst, dstSiz= e, cSrc, cSrcSize, DTable); + if (ret =3D=3D 0) + return 0; } =20 - assert(args.ip[0] >=3D args.ilimit); - HUF_decompress4X2_usingDTable_internal_bmi2_asm_loop(&args); + assert(args.ip[0] >=3D args.ilowest); + loopFn(&args); =20 /* note : op4 already verified within main loop */ - assert(args.ip[0] >=3D iend); - assert(args.ip[1] >=3D iend); - assert(args.ip[2] >=3D iend); - assert(args.ip[3] >=3D iend); + assert(args.ip[0] >=3D ilowest); + assert(args.ip[1] >=3D ilowest); + assert(args.ip[2] >=3D ilowest); + assert(args.ip[3] >=3D ilowest); assert(args.op[3] <=3D oend); - (void)iend; + + assert(ilowest =3D=3D args.ilowest); + assert(ilowest + 6 =3D=3D args.iend[0]); + (void)ilowest; =20 /* finish bitStreams one by one */ { @@ -1426,91 +1712,72 @@ HUF_decompress4X2_usingDTable_internal_bmi2_asm( /* decoded size */ return dstSize; } -#endif /* ZSTD_ENABLE_ASM_X86_64_BMI2 */ =20 static size_t HUF_decompress4X2_usingDTable_internal(void* dst, size_t dst= Size, void const* cSrc, - size_t cSrcSize, HUF_DTable const* DTable, int bmi2) + size_t cSrcSize, HUF_DTable const* DTable, int flags) { + HUF_DecompressUsingDTableFn fallbackFn =3D HUF_decompress4X2_usingDTab= le_internal_default; + HUF_DecompressFastLoopFn loopFn =3D HUF_decompress4X2_usingDTable_inte= rnal_fast_c_loop; + #if DYNAMIC_BMI2 - if (bmi2) { + if (flags & HUF_flags_bmi2) { + fallbackFn =3D HUF_decompress4X2_usingDTable_internal_bmi2; # if ZSTD_ENABLE_ASM_X86_64_BMI2 - return HUF_decompress4X2_usingDTable_internal_bmi2_asm(dst, dstSiz= e, cSrc, cSrcSize, DTable); -# else - return HUF_decompress4X2_usingDTable_internal_bmi2(dst, dstSize, c= Src, cSrcSize, DTable); + if (!(flags & HUF_flags_disableAsm)) { + loopFn =3D HUF_decompress4X2_usingDTable_internal_fast_asm_loo= p; + } # endif + } else { + return fallbackFn(dst, dstSize, cSrc, cSrcSize, DTable); } -#else - (void)bmi2; #endif =20 #if ZSTD_ENABLE_ASM_X86_64_BMI2 && defined(__BMI2__) - return HUF_decompress4X2_usingDTable_internal_bmi2_asm(dst, dstSize, c= Src, cSrcSize, DTable); -#else - return HUF_decompress4X2_usingDTable_internal_default(dst, dstSize, cS= rc, cSrcSize, DTable); + if (!(flags & HUF_flags_disableAsm)) { + loopFn =3D HUF_decompress4X2_usingDTable_internal_fast_asm_loop; + } #endif + + if (HUF_ENABLE_FAST_DECODE && !(flags & HUF_flags_disableFast)) { + size_t const ret =3D HUF_decompress4X2_usingDTable_internal_fast(d= st, dstSize, cSrc, cSrcSize, DTable, loopFn); + if (ret !=3D 0) + return ret; + } + return fallbackFn(dst, dstSize, cSrc, cSrcSize, DTable); } =20 HUF_DGEN(HUF_decompress1X2_usingDTable_internal) =20 -size_t HUF_decompress1X2_usingDTable( - void* dst, size_t dstSize, - const void* cSrc, size_t cSrcSize, - const HUF_DTable* DTable) -{ - DTableDesc dtd =3D HUF_getDTableDesc(DTable); - if (dtd.tableType !=3D 1) return ERROR(GENERIC); - return HUF_decompress1X2_usingDTable_internal(dst, dstSize, cSrc, cSrc= Size, DTable, /* bmi2 */ 0); -} - size_t HUF_decompress1X2_DCtx_wksp(HUF_DTable* DCtx, void* dst, size_t dst= Size, const void* cSrc, size_t cSrcSize, - void* workSpace, size_t wkspSize) + void* workSpace, size_t wkspSize, int f= lags) { const BYTE* ip =3D (const BYTE*) cSrc; =20 size_t const hSize =3D HUF_readDTableX2_wksp(DCtx, cSrc, cSrcSize, - workSpace, wkspSize); + workSpace, wkspSize, flags); if (HUF_isError(hSize)) return hSize; if (hSize >=3D cSrcSize) return ERROR(srcSize_wrong); ip +=3D hSize; cSrcSize -=3D hSize; =20 - return HUF_decompress1X2_usingDTable_internal(dst, dstSize, ip, cSrcSi= ze, DCtx, /* bmi2 */ 0); + return HUF_decompress1X2_usingDTable_internal(dst, dstSize, ip, cSrcSi= ze, DCtx, flags); } =20 - -size_t HUF_decompress4X2_usingDTable( - void* dst, size_t dstSize, - const void* cSrc, size_t cSrcSize, - const HUF_DTable* DTable) -{ - DTableDesc dtd =3D HUF_getDTableDesc(DTable); - if (dtd.tableType !=3D 1) return ERROR(GENERIC); - return HUF_decompress4X2_usingDTable_internal(dst, dstSize, cSrc, cSrc= Size, DTable, /* bmi2 */ 0); -} - -static size_t HUF_decompress4X2_DCtx_wksp_bmi2(HUF_DTable* dctx, void* dst= , size_t dstSize, +static size_t HUF_decompress4X2_DCtx_wksp(HUF_DTable* dctx, void* dst, siz= e_t dstSize, const void* cSrc, size_t cSrcSize, - void* workSpace, size_t wkspSize, int b= mi2) + void* workSpace, size_t wkspSize, int f= lags) { const BYTE* ip =3D (const BYTE*) cSrc; =20 size_t hSize =3D HUF_readDTableX2_wksp(dctx, cSrc, cSrcSize, - workSpace, wkspSize); + workSpace, wkspSize, flags); if (HUF_isError(hSize)) return hSize; if (hSize >=3D cSrcSize) return ERROR(srcSize_wrong); ip +=3D hSize; cSrcSize -=3D hSize; =20 - return HUF_decompress4X2_usingDTable_internal(dst, dstSize, ip, cSrcSi= ze, dctx, bmi2); + return HUF_decompress4X2_usingDTable_internal(dst, dstSize, ip, cSrcSi= ze, dctx, flags); } =20 -size_t HUF_decompress4X2_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dst= Size, - const void* cSrc, size_t cSrcSize, - void* workSpace, size_t wkspSize) -{ - return HUF_decompress4X2_DCtx_wksp_bmi2(dctx, dst, dstSize, cSrc, cSrc= Size, workSpace, wkspSize, /* bmi2 */ 0); -} - - #endif /* HUF_FORCE_DECOMPRESS_X1 */ =20 =20 @@ -1518,44 +1785,6 @@ size_t HUF_decompress4X2_DCtx_wksp(HUF_DTable* dctx,= void* dst, size_t dstSize, /* Universal decompression selectors */ /* ***********************************/ =20 -size_t HUF_decompress1X_usingDTable(void* dst, size_t maxDstSize, - const void* cSrc, size_t cSrcSize, - const HUF_DTable* DTable) -{ - DTableDesc const dtd =3D HUF_getDTableDesc(DTable); -#if defined(HUF_FORCE_DECOMPRESS_X1) - (void)dtd; - assert(dtd.tableType =3D=3D 0); - return HUF_decompress1X1_usingDTable_internal(dst, maxDstSize, cSrc, c= SrcSize, DTable, /* bmi2 */ 0); -#elif defined(HUF_FORCE_DECOMPRESS_X2) - (void)dtd; - assert(dtd.tableType =3D=3D 1); - return HUF_decompress1X2_usingDTable_internal(dst, maxDstSize, cSrc, c= SrcSize, DTable, /* bmi2 */ 0); -#else - return dtd.tableType ? HUF_decompress1X2_usingDTable_internal(dst, max= DstSize, cSrc, cSrcSize, DTable, /* bmi2 */ 0) : - HUF_decompress1X1_usingDTable_internal(dst, max= DstSize, cSrc, cSrcSize, DTable, /* bmi2 */ 0); -#endif -} - -size_t HUF_decompress4X_usingDTable(void* dst, size_t maxDstSize, - const void* cSrc, size_t cSrcSize, - const HUF_DTable* DTable) -{ - DTableDesc const dtd =3D HUF_getDTableDesc(DTable); -#if defined(HUF_FORCE_DECOMPRESS_X1) - (void)dtd; - assert(dtd.tableType =3D=3D 0); - return HUF_decompress4X1_usingDTable_internal(dst, maxDstSize, cSrc, c= SrcSize, DTable, /* bmi2 */ 0); -#elif defined(HUF_FORCE_DECOMPRESS_X2) - (void)dtd; - assert(dtd.tableType =3D=3D 1); - return HUF_decompress4X2_usingDTable_internal(dst, maxDstSize, cSrc, c= SrcSize, DTable, /* bmi2 */ 0); -#else - return dtd.tableType ? HUF_decompress4X2_usingDTable_internal(dst, max= DstSize, cSrc, cSrcSize, DTable, /* bmi2 */ 0) : - HUF_decompress4X1_usingDTable_internal(dst, max= DstSize, cSrc, cSrcSize, DTable, /* bmi2 */ 0); -#endif -} - =20 #if !defined(HUF_FORCE_DECOMPRESS_X1) && !defined(HUF_FORCE_DECOMPRESS_X2) typedef struct { U32 tableTime; U32 decode256Time; } algo_time_t; @@ -1610,36 +1839,9 @@ U32 HUF_selectDecoder (size_t dstSize, size_t cSrcSi= ze) #endif } =20 - -size_t HUF_decompress4X_hufOnly_wksp(HUF_DTable* dctx, void* dst, - size_t dstSize, const void* cSrc, - size_t cSrcSize, void* workSpace, - size_t wkspSize) -{ - /* validation checks */ - if (dstSize =3D=3D 0) return ERROR(dstSize_tooSmall); - if (cSrcSize =3D=3D 0) return ERROR(corruption_detected); - - { U32 const algoNb =3D HUF_selectDecoder(dstSize, cSrcSize); -#if defined(HUF_FORCE_DECOMPRESS_X1) - (void)algoNb; - assert(algoNb =3D=3D 0); - return HUF_decompress4X1_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcS= ize, workSpace, wkspSize); -#elif defined(HUF_FORCE_DECOMPRESS_X2) - (void)algoNb; - assert(algoNb =3D=3D 1); - return HUF_decompress4X2_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcS= ize, workSpace, wkspSize); -#else - return algoNb ? HUF_decompress4X2_DCtx_wksp(dctx, dst, dstSize, cS= rc, - cSrcSize, workSpace, wkspSize): - HUF_decompress4X1_DCtx_wksp(dctx, dst, dstSize, cS= rc, cSrcSize, workSpace, wkspSize); -#endif - } -} - size_t HUF_decompress1X_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dstS= ize, const void* cSrc, size_t cSrcSize, - void* workSpace, size_t wkspSize) + void* workSpace, size_t wkspSize, int fl= ags) { /* validation checks */ if (dstSize =3D=3D 0) return ERROR(dstSize_tooSmall); @@ -1652,71 +1854,71 @@ size_t HUF_decompress1X_DCtx_wksp(HUF_DTable* dctx,= void* dst, size_t dstSize, (void)algoNb; assert(algoNb =3D=3D 0); return HUF_decompress1X1_DCtx_wksp(dctx, dst, dstSize, cSrc, - cSrcSize, workSpace, wkspSize); + cSrcSize, workSpace, wkspSize, flags); #elif defined(HUF_FORCE_DECOMPRESS_X2) (void)algoNb; assert(algoNb =3D=3D 1); return HUF_decompress1X2_DCtx_wksp(dctx, dst, dstSize, cSrc, - cSrcSize, workSpace, wkspSize); + cSrcSize, workSpace, wkspSize, flags); #else return algoNb ? HUF_decompress1X2_DCtx_wksp(dctx, dst, dstSize, cS= rc, - cSrcSize, workSpace, wkspSize): + cSrcSize, workSpace, wkspSize, flags): HUF_decompress1X1_DCtx_wksp(dctx, dst, dstSize, cS= rc, - cSrcSize, workSpace, wkspSize); + cSrcSize, workSpace, wkspSize, flags); #endif } } =20 =20 -size_t HUF_decompress1X_usingDTable_bmi2(void* dst, size_t maxDstSize, con= st void* cSrc, size_t cSrcSize, const HUF_DTable* DTable, int bmi2) +size_t HUF_decompress1X_usingDTable(void* dst, size_t maxDstSize, const vo= id* cSrc, size_t cSrcSize, const HUF_DTable* DTable, int flags) { DTableDesc const dtd =3D HUF_getDTableDesc(DTable); #if defined(HUF_FORCE_DECOMPRESS_X1) (void)dtd; assert(dtd.tableType =3D=3D 0); - return HUF_decompress1X1_usingDTable_internal(dst, maxDstSize, cSrc, c= SrcSize, DTable, bmi2); + return HUF_decompress1X1_usingDTable_internal(dst, maxDstSize, cSrc, c= SrcSize, DTable, flags); #elif defined(HUF_FORCE_DECOMPRESS_X2) (void)dtd; assert(dtd.tableType =3D=3D 1); - return HUF_decompress1X2_usingDTable_internal(dst, maxDstSize, cSrc, c= SrcSize, DTable, bmi2); + return HUF_decompress1X2_usingDTable_internal(dst, maxDstSize, cSrc, c= SrcSize, DTable, flags); #else - return dtd.tableType ? HUF_decompress1X2_usingDTable_internal(dst, max= DstSize, cSrc, cSrcSize, DTable, bmi2) : - HUF_decompress1X1_usingDTable_internal(dst, max= DstSize, cSrc, cSrcSize, DTable, bmi2); + return dtd.tableType ? HUF_decompress1X2_usingDTable_internal(dst, max= DstSize, cSrc, cSrcSize, DTable, flags) : + HUF_decompress1X1_usingDTable_internal(dst, max= DstSize, cSrc, cSrcSize, DTable, flags); #endif } =20 #ifndef HUF_FORCE_DECOMPRESS_X2 -size_t HUF_decompress1X1_DCtx_wksp_bmi2(HUF_DTable* dctx, void* dst, size_= t dstSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspS= ize, int bmi2) +size_t HUF_decompress1X1_DCtx_wksp(HUF_DTable* dctx, void* dst, size_t dst= Size, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize, = int flags) { const BYTE* ip =3D (const BYTE*) cSrc; =20 - size_t const hSize =3D HUF_readDTableX1_wksp_bmi2(dctx, cSrc, cSrcSize= , workSpace, wkspSize, bmi2); + size_t const hSize =3D HUF_readDTableX1_wksp(dctx, cSrc, cSrcSize, wor= kSpace, wkspSize, flags); if (HUF_isError(hSize)) return hSize; if (hSize >=3D cSrcSize) return ERROR(srcSize_wrong); ip +=3D hSize; cSrcSize -=3D hSize; =20 - return HUF_decompress1X1_usingDTable_internal(dst, dstSize, ip, cSrcSi= ze, dctx, bmi2); + return HUF_decompress1X1_usingDTable_internal(dst, dstSize, ip, cSrcSi= ze, dctx, flags); } #endif =20 -size_t HUF_decompress4X_usingDTable_bmi2(void* dst, size_t maxDstSize, con= st void* cSrc, size_t cSrcSize, const HUF_DTable* DTable, int bmi2) +size_t HUF_decompress4X_usingDTable(void* dst, size_t maxDstSize, const vo= id* cSrc, size_t cSrcSize, const HUF_DTable* DTable, int flags) { DTableDesc const dtd =3D HUF_getDTableDesc(DTable); #if defined(HUF_FORCE_DECOMPRESS_X1) (void)dtd; assert(dtd.tableType =3D=3D 0); - return HUF_decompress4X1_usingDTable_internal(dst, maxDstSize, cSrc, c= SrcSize, DTable, bmi2); + return HUF_decompress4X1_usingDTable_internal(dst, maxDstSize, cSrc, c= SrcSize, DTable, flags); #elif defined(HUF_FORCE_DECOMPRESS_X2) (void)dtd; assert(dtd.tableType =3D=3D 1); - return HUF_decompress4X2_usingDTable_internal(dst, maxDstSize, cSrc, c= SrcSize, DTable, bmi2); + return HUF_decompress4X2_usingDTable_internal(dst, maxDstSize, cSrc, c= SrcSize, DTable, flags); #else - return dtd.tableType ? HUF_decompress4X2_usingDTable_internal(dst, max= DstSize, cSrc, cSrcSize, DTable, bmi2) : - HUF_decompress4X1_usingDTable_internal(dst, max= DstSize, cSrc, cSrcSize, DTable, bmi2); + return dtd.tableType ? HUF_decompress4X2_usingDTable_internal(dst, max= DstSize, cSrc, cSrcSize, DTable, flags) : + HUF_decompress4X1_usingDTable_internal(dst, max= DstSize, cSrc, cSrcSize, DTable, flags); #endif } =20 -size_t HUF_decompress4X_hufOnly_wksp_bmi2(HUF_DTable* dctx, void* dst, siz= e_t dstSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wks= pSize, int bmi2) +size_t HUF_decompress4X_hufOnly_wksp(HUF_DTable* dctx, void* dst, size_t d= stSize, const void* cSrc, size_t cSrcSize, void* workSpace, size_t wkspSize= , int flags) { /* validation checks */ if (dstSize =3D=3D 0) return ERROR(dstSize_tooSmall); @@ -1726,15 +1928,14 @@ size_t HUF_decompress4X_hufOnly_wksp_bmi2(HUF_DTabl= e* dctx, void* dst, size_t ds #if defined(HUF_FORCE_DECOMPRESS_X1) (void)algoNb; assert(algoNb =3D=3D 0); - return HUF_decompress4X1_DCtx_wksp_bmi2(dctx, dst, dstSize, cSrc, = cSrcSize, workSpace, wkspSize, bmi2); + return HUF_decompress4X1_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcS= ize, workSpace, wkspSize, flags); #elif defined(HUF_FORCE_DECOMPRESS_X2) (void)algoNb; assert(algoNb =3D=3D 1); - return HUF_decompress4X2_DCtx_wksp_bmi2(dctx, dst, dstSize, cSrc, = cSrcSize, workSpace, wkspSize, bmi2); + return HUF_decompress4X2_DCtx_wksp(dctx, dst, dstSize, cSrc, cSrcS= ize, workSpace, wkspSize, flags); #else - return algoNb ? HUF_decompress4X2_DCtx_wksp_bmi2(dctx, dst, dstSiz= e, cSrc, cSrcSize, workSpace, wkspSize, bmi2) : - HUF_decompress4X1_DCtx_wksp_bmi2(dctx, dst, dstSiz= e, cSrc, cSrcSize, workSpace, wkspSize, bmi2); + return algoNb ? HUF_decompress4X2_DCtx_wksp(dctx, dst, dstSize, cS= rc, cSrcSize, workSpace, wkspSize, flags) : + HUF_decompress4X1_DCtx_wksp(dctx, dst, dstSize, cS= rc, cSrcSize, workSpace, wkspSize, flags); #endif } } - diff --git a/lib/zstd/decompress/zstd_ddict.c b/lib/zstd/decompress/zstd_dd= ict.c index dbbc7919de53..30ef65e1ab5c 100644 --- a/lib/zstd/decompress/zstd_ddict.c +++ b/lib/zstd/decompress/zstd_ddict.c @@ -1,5 +1,6 @@ +// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -14,12 +15,12 @@ /*-******************************************************* * Dependencies *********************************************************/ +#include "../common/allocations.h" /* ZSTD_customMalloc, ZSTD_customFree = */ #include "../common/zstd_deps.h" /* ZSTD_memcpy, ZSTD_memmove, ZSTD_mems= et */ #include "../common/cpu.h" /* bmi2 */ #include "../common/mem.h" /* low level memory routines */ #define FSE_STATIC_LINKING_ONLY #include "../common/fse.h" -#define HUF_STATIC_LINKING_ONLY #include "../common/huf.h" #include "zstd_decompress_internal.h" #include "zstd_ddict.h" @@ -131,7 +132,7 @@ static size_t ZSTD_initDDict_internal(ZSTD_DDict* ddict, ZSTD_memcpy(internalBuffer, dict, dictSize); } ddict->dictSize =3D dictSize; - ddict->entropy.hufTable[0] =3D (HUF_DTable)((HufLog)*0x1000001); /* c= over both little and big endian */ + ddict->entropy.hufTable[0] =3D (HUF_DTable)((ZSTD_HUFFDTABLE_CAPACITY_= LOG)*0x1000001); /* cover both little and big endian */ =20 /* parse dictionary content */ FORWARD_IF_ERROR( ZSTD_loadEntropy_intoDDict(ddict, dictContentType) ,= ""); @@ -237,5 +238,5 @@ size_t ZSTD_sizeof_DDict(const ZSTD_DDict* ddict) unsigned ZSTD_getDictID_fromDDict(const ZSTD_DDict* ddict) { if (ddict=3D=3DNULL) return 0; - return ZSTD_getDictID_fromDict(ddict->dictContent, ddict->dictSize); + return ddict->dictID; } diff --git a/lib/zstd/decompress/zstd_ddict.h b/lib/zstd/decompress/zstd_dd= ict.h index 8c1a79d666f8..de459a0dacd1 100644 --- a/lib/zstd/decompress/zstd_ddict.h +++ b/lib/zstd/decompress/zstd_ddict.h @@ -1,5 +1,6 @@ +/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */ /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the diff --git a/lib/zstd/decompress/zstd_decompress.c b/lib/zstd/decompress/zs= td_decompress.c index 6b3177c94711..bb009554e3a6 100644 --- a/lib/zstd/decompress/zstd_decompress.c +++ b/lib/zstd/decompress/zstd_decompress.c @@ -1,5 +1,6 @@ +// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -53,13 +54,15 @@ * Dependencies *********************************************************/ #include "../common/zstd_deps.h" /* ZSTD_memcpy, ZSTD_memmove, ZSTD_mems= et */ +#include "../common/allocations.h" /* ZSTD_customMalloc, ZSTD_customCallo= c, ZSTD_customFree */ +#include "../common/error_private.h" +#include "../common/zstd_internal.h" /* blockProperties_t */ #include "../common/mem.h" /* low level memory routines */ +#include "../common/bits.h" /* ZSTD_highbit32 */ #define FSE_STATIC_LINKING_ONLY #include "../common/fse.h" -#define HUF_STATIC_LINKING_ONLY #include "../common/huf.h" #include /* xxh64_reset, xxh64_update, xxh64_digest, XXH6= 4 */ -#include "../common/zstd_internal.h" /* blockProperties_t */ #include "zstd_decompress_internal.h" /* ZSTD_DCtx */ #include "zstd_ddict.h" /* ZSTD_DDictDictContent */ #include "zstd_decompress_block.h" /* ZSTD_decompressBlock_internal */ @@ -72,11 +75,11 @@ *************************************/ =20 #define DDICT_HASHSET_MAX_LOAD_FACTOR_COUNT_MULT 4 -#define DDICT_HASHSET_MAX_LOAD_FACTOR_SIZE_MULT 3 /* These two constants= represent SIZE_MULT/COUNT_MULT load factor without using a float. - * Currently, that mea= ns a 0.75 load factor. - * So, if count * COUN= T_MULT / size * SIZE_MULT !=3D 0, then we've exceeded - * the load factor of = the ddict hash set. - */ +#define DDICT_HASHSET_MAX_LOAD_FACTOR_SIZE_MULT 3 /* These two constants = represent SIZE_MULT/COUNT_MULT load factor without using a float. + * Currently, that mean= s a 0.75 load factor. + * So, if count * COUNT= _MULT / size * SIZE_MULT !=3D 0, then we've exceeded + * the load factor of t= he ddict hash set. + */ =20 #define DDICT_HASHSET_TABLE_BASE_SIZE 64 #define DDICT_HASHSET_RESIZE_FACTOR 2 @@ -237,6 +240,8 @@ static void ZSTD_DCtx_resetParameters(ZSTD_DCtx* dctx) dctx->outBufferMode =3D ZSTD_bm_buffered; dctx->forceIgnoreChecksum =3D ZSTD_d_validateChecksum; dctx->refMultipleDDicts =3D ZSTD_rmd_refSingleDDict; + dctx->disableHufAsm =3D 0; + dctx->maxBlockSizeParam =3D 0; } =20 static void ZSTD_initDCtx_internal(ZSTD_DCtx* dctx) @@ -253,6 +258,7 @@ static void ZSTD_initDCtx_internal(ZSTD_DCtx* dctx) dctx->streamStage =3D zdss_init; dctx->noForwardProgress =3D 0; dctx->oversizedDuration =3D 0; + dctx->isFrameDecompression =3D 1; #if DYNAMIC_BMI2 dctx->bmi2 =3D ZSTD_cpuSupportsBmi2(); #endif @@ -421,16 +427,40 @@ size_t ZSTD_frameHeaderSize(const void* src, size_t s= rcSize) * note : only works for formats ZSTD_f_zstd1 and ZSTD_f_zstd1_magicless * @return : 0, `zfhPtr` is correctly filled, * >0, `srcSize` is too small, value is wanted `srcSize` amount, - * or an error code, which can be tested using ZSTD_isError() */ -size_t ZSTD_getFrameHeader_advanced(ZSTD_frameHeader* zfhPtr, const void* = src, size_t srcSize, ZSTD_format_e format) +** or an error code, which can be tested using ZSTD_isError() */ +size_t ZSTD_getFrameHeader_advanced(ZSTD_FrameHeader* zfhPtr, const void* = src, size_t srcSize, ZSTD_format_e format) { const BYTE* ip =3D (const BYTE*)src; size_t const minInputSize =3D ZSTD_startingInputLength(format); =20 - ZSTD_memset(zfhPtr, 0, sizeof(*zfhPtr)); /* not strictly necessary, = but static analyzer do not understand that zfhPtr is only going to be read = only if return value is zero, since they are 2 different signals */ - if (srcSize < minInputSize) return minInputSize; - RETURN_ERROR_IF(src=3D=3DNULL, GENERIC, "invalid parameter"); + DEBUGLOG(5, "ZSTD_getFrameHeader_advanced: minInputSize =3D %zu, srcSi= ze =3D %zu", minInputSize, srcSize); + + if (srcSize > 0) { + /* note : technically could be considered an assert(), since it's = an invalid entry */ + RETURN_ERROR_IF(src=3D=3DNULL, GENERIC, "invalid parameter : src= =3D=3DNULL, but srcSize>0"); + } + if (srcSize < minInputSize) { + if (srcSize > 0 && format !=3D ZSTD_f_zstd1_magicless) { + /* when receiving less than @minInputSize bytes, + * control these bytes at least correspond to a supported magi= c number + * in order to error out early if they don't. + **/ + size_t const toCopy =3D MIN(4, srcSize); + unsigned char hbuf[4]; MEM_writeLE32(hbuf, ZSTD_MAGICNUMBER); + assert(src !=3D NULL); + ZSTD_memcpy(hbuf, src, toCopy); + if ( MEM_readLE32(hbuf) !=3D ZSTD_MAGICNUMBER ) { + /* not a zstd frame : let's check if it's a skippable fram= e */ + MEM_writeLE32(hbuf, ZSTD_MAGIC_SKIPPABLE_START); + ZSTD_memcpy(hbuf, src, toCopy); + if ((MEM_readLE32(hbuf) & ZSTD_MAGIC_SKIPPABLE_MASK) !=3D = ZSTD_MAGIC_SKIPPABLE_START) { + RETURN_ERROR(prefix_unknown, + "first bytes don't correspond to any suppo= rted magic number"); + } } } + return minInputSize; + } =20 + ZSTD_memset(zfhPtr, 0, sizeof(*zfhPtr)); /* not strictly necessary, = but static analyzers may not understand that zfhPtr will be read only if re= turn value is zero, since they are 2 different signals */ if ( (format !=3D ZSTD_f_zstd1_magicless) && (MEM_readLE32(src) !=3D ZSTD_MAGICNUMBER) ) { if ((MEM_readLE32(src) & ZSTD_MAGIC_SKIPPABLE_MASK) =3D=3D ZSTD_MA= GIC_SKIPPABLE_START) { @@ -438,8 +468,10 @@ size_t ZSTD_getFrameHeader_advanced(ZSTD_frameHeader* = zfhPtr, const void* src, s if (srcSize < ZSTD_SKIPPABLEHEADERSIZE) return ZSTD_SKIPPABLEHEADERSIZE; /* magic number + frame l= ength */ ZSTD_memset(zfhPtr, 0, sizeof(*zfhPtr)); - zfhPtr->frameContentSize =3D MEM_readLE32((const char *)src + = ZSTD_FRAMEIDSIZE); zfhPtr->frameType =3D ZSTD_skippableFrame; + zfhPtr->dictID =3D MEM_readLE32(src) - ZSTD_MAGIC_SKIPPABLE_ST= ART; + zfhPtr->headerSize =3D ZSTD_SKIPPABLEHEADERSIZE; + zfhPtr->frameContentSize =3D MEM_readLE32((const char *)src + = ZSTD_FRAMEIDSIZE); return 0; } RETURN_ERROR(prefix_unknown, ""); @@ -508,7 +540,7 @@ size_t ZSTD_getFrameHeader_advanced(ZSTD_frameHeader* z= fhPtr, const void* src, s * @return : 0, `zfhPtr` is correctly filled, * >0, `srcSize` is too small, value is wanted `srcSize` amount, * or an error code, which can be tested using ZSTD_isError() */ -size_t ZSTD_getFrameHeader(ZSTD_frameHeader* zfhPtr, const void* src, size= _t srcSize) +size_t ZSTD_getFrameHeader(ZSTD_FrameHeader* zfhPtr, const void* src, size= _t srcSize) { return ZSTD_getFrameHeader_advanced(zfhPtr, src, srcSize, ZSTD_f_zstd1= ); } @@ -520,7 +552,7 @@ size_t ZSTD_getFrameHeader(ZSTD_frameHeader* zfhPtr, co= nst void* src, size_t src * - ZSTD_CONTENTSIZE_ERROR if an error occurred (e.g. invalid mag= ic number, srcSize too small) */ unsigned long long ZSTD_getFrameContentSize(const void *src, size_t srcSiz= e) { - { ZSTD_frameHeader zfh; + { ZSTD_FrameHeader zfh; if (ZSTD_getFrameHeader(&zfh, src, srcSize) !=3D 0) return ZSTD_CONTENTSIZE_ERROR; if (zfh.frameType =3D=3D ZSTD_skippableFrame) { @@ -540,49 +572,52 @@ static size_t readSkippableFrameSize(void const* src,= size_t srcSize) sizeU32 =3D MEM_readLE32((BYTE const*)src + ZSTD_FRAMEIDSIZE); RETURN_ERROR_IF((U32)(sizeU32 + ZSTD_SKIPPABLEHEADERSIZE) < sizeU32, frameParameter_unsupported, ""); - { - size_t const skippableSize =3D skippableHeaderSize + sizeU32; + { size_t const skippableSize =3D skippableHeaderSize + sizeU32; RETURN_ERROR_IF(skippableSize > srcSize, srcSize_wrong, ""); return skippableSize; } } =20 /*! ZSTD_readSkippableFrame() : - * Retrieves a zstd skippable frame containing data given by src, and writ= es it to dst buffer. + * Retrieves content of a skippable frame, and writes it to dst buffer. * * The parameter magicVariant will receive the magicVariant that was suppl= ied when the frame was written, * i.e. magicNumber - ZSTD_MAGIC_SKIPPABLE_START. This can be NULL if the= caller is not interested * in the magicVariant. * - * Returns an error if destination buffer is not large enough, or if the f= rame is not skippable. + * Returns an error if destination buffer is not large enough, or if this = is not a valid skippable frame. * * @return : number of bytes written or a ZSTD error. */ -ZSTDLIB_API size_t ZSTD_readSkippableFrame(void* dst, size_t dstCapacity, = unsigned* magicVariant, - const void* src, size_t srcSiz= e) +size_t ZSTD_readSkippableFrame(void* dst, size_t dstCapacity, + unsigned* magicVariant, /* optional, can b= e NULL */ + const void* src, size_t srcSize) { - U32 const magicNumber =3D MEM_readLE32(src); - size_t skippableFrameSize =3D readSkippableFrameSize(src, srcSize); - size_t skippableContentSize =3D skippableFrameSize - ZSTD_SKIPPABLEHEA= DERSIZE; - - /* check input validity */ - RETURN_ERROR_IF(!ZSTD_isSkippableFrame(src, srcSize), frameParameter_u= nsupported, ""); - RETURN_ERROR_IF(skippableFrameSize < ZSTD_SKIPPABLEHEADERSIZE || skipp= ableFrameSize > srcSize, srcSize_wrong, ""); - RETURN_ERROR_IF(skippableContentSize > dstCapacity, dstSize_tooSmall, = ""); + RETURN_ERROR_IF(srcSize < ZSTD_SKIPPABLEHEADERSIZE, srcSize_wrong, ""); =20 - /* deliver payload */ - if (skippableContentSize > 0 && dst !=3D NULL) - ZSTD_memcpy(dst, (const BYTE *)src + ZSTD_SKIPPABLEHEADERSIZE, ski= ppableContentSize); - if (magicVariant !=3D NULL) - *magicVariant =3D magicNumber - ZSTD_MAGIC_SKIPPABLE_START; - return skippableContentSize; + { U32 const magicNumber =3D MEM_readLE32(src); + size_t skippableFrameSize =3D readSkippableFrameSize(src, srcSize); + size_t skippableContentSize =3D skippableFrameSize - ZSTD_SKIPPABL= EHEADERSIZE; + + /* check input validity */ + RETURN_ERROR_IF(!ZSTD_isSkippableFrame(src, srcSize), frameParamet= er_unsupported, ""); + RETURN_ERROR_IF(skippableFrameSize < ZSTD_SKIPPABLEHEADERSIZE || s= kippableFrameSize > srcSize, srcSize_wrong, ""); + RETURN_ERROR_IF(skippableContentSize > dstCapacity, dstSize_tooSma= ll, ""); + + /* deliver payload */ + if (skippableContentSize > 0 && dst !=3D NULL) + ZSTD_memcpy(dst, (const BYTE *)src + ZSTD_SKIPPABLEHEADERSIZE,= skippableContentSize); + if (magicVariant !=3D NULL) + *magicVariant =3D magicNumber - ZSTD_MAGIC_SKIPPABLE_START; + return skippableContentSize; + } } =20 /* ZSTD_findDecompressedSize() : - * compatible with legacy mode * `srcSize` must be the exact length of some number of ZSTD compressed a= nd/or * skippable frames - * @return : decompressed size of the frames contained */ + * note: compatible with legacy mode + * @return : decompressed size of the frames contained */ unsigned long long ZSTD_findDecompressedSize(const void* src, size_t srcSi= ze) { unsigned long long totalDstSize =3D 0; @@ -592,9 +627,7 @@ unsigned long long ZSTD_findDecompressedSize(const void= * src, size_t srcSize) =20 if ((magicNumber & ZSTD_MAGIC_SKIPPABLE_MASK) =3D=3D ZSTD_MAGIC_SK= IPPABLE_START) { size_t const skippableSize =3D readSkippableFrameSize(src, src= Size); - if (ZSTD_isError(skippableSize)) { - return ZSTD_CONTENTSIZE_ERROR; - } + if (ZSTD_isError(skippableSize)) return ZSTD_CONTENTSIZE_ERROR; assert(skippableSize <=3D srcSize); =20 src =3D (const BYTE *)src + skippableSize; @@ -602,17 +635,17 @@ unsigned long long ZSTD_findDecompressedSize(const vo= id* src, size_t srcSize) continue; } =20 - { unsigned long long const ret =3D ZSTD_getFrameContentSize(src,= srcSize); - if (ret >=3D ZSTD_CONTENTSIZE_ERROR) return ret; + { unsigned long long const fcs =3D ZSTD_getFrameContentSize(src,= srcSize); + if (fcs >=3D ZSTD_CONTENTSIZE_ERROR) return fcs; =20 - /* check for overflow */ - if (totalDstSize + ret < totalDstSize) return ZSTD_CONTENTSIZE= _ERROR; - totalDstSize +=3D ret; + if (totalDstSize + fcs < totalDstSize) + return ZSTD_CONTENTSIZE_ERROR; /* check for overflow */ + totalDstSize +=3D fcs; } + /* skip to next frame */ { size_t const frameSrcSize =3D ZSTD_findFrameCompressedSize(src= , srcSize); - if (ZSTD_isError(frameSrcSize)) { - return ZSTD_CONTENTSIZE_ERROR; - } + if (ZSTD_isError(frameSrcSize)) return ZSTD_CONTENTSIZE_ERROR; + assert(frameSrcSize <=3D srcSize); =20 src =3D (const BYTE *)src + frameSrcSize; srcSize -=3D frameSrcSize; @@ -676,13 +709,13 @@ static ZSTD_frameSizeInfo ZSTD_errorFrameSizeInfo(siz= e_t ret) return frameSizeInfo; } =20 -static ZSTD_frameSizeInfo ZSTD_findFrameSizeInfo(const void* src, size_t s= rcSize) +static ZSTD_frameSizeInfo ZSTD_findFrameSizeInfo(const void* src, size_t s= rcSize, ZSTD_format_e format) { ZSTD_frameSizeInfo frameSizeInfo; ZSTD_memset(&frameSizeInfo, 0, sizeof(ZSTD_frameSizeInfo)); =20 =20 - if ((srcSize >=3D ZSTD_SKIPPABLEHEADERSIZE) + if (format =3D=3D ZSTD_f_zstd1 && (srcSize >=3D ZSTD_SKIPPABLEHEADERSI= ZE) && (MEM_readLE32(src) & ZSTD_MAGIC_SKIPPABLE_MASK) =3D=3D ZSTD_MAG= IC_SKIPPABLE_START) { frameSizeInfo.compressedSize =3D readSkippableFrameSize(src, srcSi= ze); assert(ZSTD_isError(frameSizeInfo.compressedSize) || @@ -693,10 +726,10 @@ static ZSTD_frameSizeInfo ZSTD_findFrameSizeInfo(cons= t void* src, size_t srcSize const BYTE* const ipstart =3D ip; size_t remainingSize =3D srcSize; size_t nbBlocks =3D 0; - ZSTD_frameHeader zfh; + ZSTD_FrameHeader zfh; =20 /* Extract Frame Header */ - { size_t const ret =3D ZSTD_getFrameHeader(&zfh, src, srcSize); + { size_t const ret =3D ZSTD_getFrameHeader_advanced(&zfh, src, s= rcSize, format); if (ZSTD_isError(ret)) return ZSTD_errorFrameSizeInfo(ret); if (ret > 0) @@ -730,28 +763,31 @@ static ZSTD_frameSizeInfo ZSTD_findFrameSizeInfo(cons= t void* src, size_t srcSize ip +=3D 4; } =20 + frameSizeInfo.nbBlocks =3D nbBlocks; frameSizeInfo.compressedSize =3D (size_t)(ip - ipstart); frameSizeInfo.decompressedBound =3D (zfh.frameContentSize !=3D ZST= D_CONTENTSIZE_UNKNOWN) ? zfh.frameContentSize - : nbBlocks * zfh.blockSizeMax; + : (unsigned long long)nbBlocks * z= fh.blockSizeMax; return frameSizeInfo; } } =20 +static size_t ZSTD_findFrameCompressedSize_advanced(const void *src, size_= t srcSize, ZSTD_format_e format) { + ZSTD_frameSizeInfo const frameSizeInfo =3D ZSTD_findFrameSizeInfo(src,= srcSize, format); + return frameSizeInfo.compressedSize; +} + /* ZSTD_findFrameCompressedSize() : - * compatible with legacy mode - * `src` must point to the start of a ZSTD frame, ZSTD legacy frame, or s= kippable frame - * `srcSize` must be at least as large as the frame contained - * @return : the compressed size of the frame starting at `src` */ + * See docs in zstd.h + * Note: compatible with legacy mode */ size_t ZSTD_findFrameCompressedSize(const void *src, size_t srcSize) { - ZSTD_frameSizeInfo const frameSizeInfo =3D ZSTD_findFrameSizeInfo(src,= srcSize); - return frameSizeInfo.compressedSize; + return ZSTD_findFrameCompressedSize_advanced(src, srcSize, ZSTD_f_zstd= 1); } =20 /* ZSTD_decompressBound() : * compatible with legacy mode - * `src` must point to the start of a ZSTD frame or a skippeable frame + * `src` must point to the start of a ZSTD frame or a skippable frame * `srcSize` must be at least as large as the frame contained * @return : the maximum decompressed size of the compressed source */ @@ -760,7 +796,7 @@ unsigned long long ZSTD_decompressBound(const void* src= , size_t srcSize) unsigned long long bound =3D 0; /* Iterate over each frame */ while (srcSize > 0) { - ZSTD_frameSizeInfo const frameSizeInfo =3D ZSTD_findFrameSizeInfo(= src, srcSize); + ZSTD_frameSizeInfo const frameSizeInfo =3D ZSTD_findFrameSizeInfo(= src, srcSize, ZSTD_f_zstd1); size_t const compressedSize =3D frameSizeInfo.compressedSize; unsigned long long const decompressedBound =3D frameSizeInfo.decom= pressedBound; if (ZSTD_isError(compressedSize) || decompressedBound =3D=3D ZSTD_= CONTENTSIZE_ERROR) @@ -773,6 +809,48 @@ unsigned long long ZSTD_decompressBound(const void* sr= c, size_t srcSize) return bound; } =20 +size_t ZSTD_decompressionMargin(void const* src, size_t srcSize) +{ + size_t margin =3D 0; + unsigned maxBlockSize =3D 0; + + /* Iterate over each frame */ + while (srcSize > 0) { + ZSTD_frameSizeInfo const frameSizeInfo =3D ZSTD_findFrameSizeInfo(= src, srcSize, ZSTD_f_zstd1); + size_t const compressedSize =3D frameSizeInfo.compressedSize; + unsigned long long const decompressedBound =3D frameSizeInfo.decom= pressedBound; + ZSTD_FrameHeader zfh; + + FORWARD_IF_ERROR(ZSTD_getFrameHeader(&zfh, src, srcSize), ""); + if (ZSTD_isError(compressedSize) || decompressedBound =3D=3D ZSTD_= CONTENTSIZE_ERROR) + return ERROR(corruption_detected); + + if (zfh.frameType =3D=3D ZSTD_frame) { + /* Add the frame header to our margin */ + margin +=3D zfh.headerSize; + /* Add the checksum to our margin */ + margin +=3D zfh.checksumFlag ? 4 : 0; + /* Add 3 bytes per block */ + margin +=3D 3 * frameSizeInfo.nbBlocks; + + /* Compute the max block size */ + maxBlockSize =3D MAX(maxBlockSize, zfh.blockSizeMax); + } else { + assert(zfh.frameType =3D=3D ZSTD_skippableFrame); + /* Add the entire skippable frame size to our margin. */ + margin +=3D compressedSize; + } + + assert(srcSize >=3D compressedSize); + src =3D (const BYTE*)src + compressedSize; + srcSize -=3D compressedSize; + } + + /* Add the max block size back to the margin. */ + margin +=3D maxBlockSize; + + return margin; +} =20 /*-************************************************************* * Frame decoding @@ -815,7 +893,7 @@ static size_t ZSTD_setRleBlock(void* dst, size_t dstCap= acity, return regenSize; } =20 -static void ZSTD_DCtx_trace_end(ZSTD_DCtx const* dctx, U64 uncompressedSiz= e, U64 compressedSize, unsigned streaming) +static void ZSTD_DCtx_trace_end(ZSTD_DCtx const* dctx, U64 uncompressedSiz= e, U64 compressedSize, int streaming) { (void)dctx; (void)uncompressedSize; @@ -856,6 +934,10 @@ static size_t ZSTD_decompressFrame(ZSTD_DCtx* dctx, ip +=3D frameHeaderSize; remainingSrcSize -=3D frameHeaderSize; } =20 + /* Shrink the blockSizeMax if enabled */ + if (dctx->maxBlockSizeParam !=3D 0) + dctx->fParams.blockSizeMax =3D MIN(dctx->fParams.blockSizeMax, (un= signed)dctx->maxBlockSizeParam); + /* Loop on each block */ while (1) { BYTE* oBlockEnd =3D oend; @@ -888,7 +970,8 @@ static size_t ZSTD_decompressFrame(ZSTD_DCtx* dctx, switch(blockProperties.blockType) { case bt_compressed: - decodedSize =3D ZSTD_decompressBlock_internal(dctx, op, (size_= t)(oBlockEnd-op), ip, cBlockSize, /* frame */ 1, not_streaming); + assert(dctx->isFrameDecompression =3D=3D 1); + decodedSize =3D ZSTD_decompressBlock_internal(dctx, op, (size_= t)(oBlockEnd-op), ip, cBlockSize, not_streaming); break; case bt_raw : /* Use oend instead of oBlockEnd because this function is safe= to overlap. It uses memmove. */ @@ -901,12 +984,14 @@ static size_t ZSTD_decompressFrame(ZSTD_DCtx* dctx, default: RETURN_ERROR(corruption_detected, "invalid block type"); } - - if (ZSTD_isError(decodedSize)) return decodedSize; - if (dctx->validateChecksum) + FORWARD_IF_ERROR(decodedSize, "Block decompression failure"); + DEBUGLOG(5, "Decompressed block of dSize =3D %u", (unsigned)decode= dSize); + if (dctx->validateChecksum) { xxh64_update(&dctx->xxhState, op, decodedSize); - if (decodedSize !=3D 0) + } + if (decodedSize) /* support dst =3D NULL,0 */ { op +=3D decodedSize; + } assert(ip !=3D NULL); ip +=3D cBlockSize; remainingSrcSize -=3D cBlockSize; @@ -930,12 +1015,15 @@ static size_t ZSTD_decompressFrame(ZSTD_DCtx* dctx, } ZSTD_DCtx_trace_end(dctx, (U64)(op-ostart), (U64)(ip-istart), /* strea= ming */ 0); /* Allow caller to get size read */ + DEBUGLOG(4, "ZSTD_decompressFrame: decompressed frame of size %i, cons= uming %i bytes of input", (int)(op-ostart), (int)(ip - (const BYTE*)*srcPtr= )); *srcPtr =3D ip; *srcSizePtr =3D remainingSrcSize; return (size_t)(op-ostart); } =20 -static size_t ZSTD_decompressMultiFrame(ZSTD_DCtx* dctx, +static +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR +size_t ZSTD_decompressMultiFrame(ZSTD_DCtx* dctx, void* dst, size_t dstCapacity, const void* src, size_t srcSize, const void* dict, size_t dictSize, @@ -955,17 +1043,18 @@ static size_t ZSTD_decompressMultiFrame(ZSTD_DCtx* d= ctx, while (srcSize >=3D ZSTD_startingInputLength(dctx->format)) { =20 =20 - { U32 const magicNumber =3D MEM_readLE32(src); - DEBUGLOG(4, "reading magic number %08X (expecting %08X)", - (unsigned)magicNumber, ZSTD_MAGICNUMBER); + if (dctx->format =3D=3D ZSTD_f_zstd1 && srcSize >=3D 4) { + U32 const magicNumber =3D MEM_readLE32(src); + DEBUGLOG(5, "reading magic number %08X", (unsigned)magicNumber= ); if ((magicNumber & ZSTD_MAGIC_SKIPPABLE_MASK) =3D=3D ZSTD_MAGI= C_SKIPPABLE_START) { + /* skippable frame detected : skip it */ size_t const skippableSize =3D readSkippableFrameSize(src,= srcSize); - FORWARD_IF_ERROR(skippableSize, "readSkippableFrameSize fa= iled"); + FORWARD_IF_ERROR(skippableSize, "invalid skippable frame"); assert(skippableSize <=3D srcSize); =20 src =3D (const BYTE *)src + skippableSize; srcSize -=3D skippableSize; - continue; + continue; /* check next frame */ } } =20 if (ddict) { @@ -1061,8 +1150,8 @@ size_t ZSTD_decompress(void* dst, size_t dstCapacity,= const void* src, size_t sr size_t ZSTD_nextSrcSizeToDecompress(ZSTD_DCtx* dctx) { return dctx->expect= ed; } =20 /* - * Similar to ZSTD_nextSrcSizeToDecompress(), but when a block input can b= e streamed, - * we allow taking a partial block as the input. Currently only raw uncomp= ressed blocks can + * Similar to ZSTD_nextSrcSizeToDecompress(), but when a block input can b= e streamed, we + * allow taking a partial block as the input. Currently only raw uncompres= sed blocks can * be streamed. * * For blocks that can be streamed, this allows us to reduce the latency u= ntil we produce @@ -1181,7 +1270,8 @@ size_t ZSTD_decompressContinue(ZSTD_DCtx* dctx, void*= dst, size_t dstCapacity, c { case bt_compressed: DEBUGLOG(5, "ZSTD_decompressContinue: case bt_compressed"); - rSize =3D ZSTD_decompressBlock_internal(dctx, dst, dstCapa= city, src, srcSize, /* frame */ 1, is_streaming); + assert(dctx->isFrameDecompression =3D=3D 1); + rSize =3D ZSTD_decompressBlock_internal(dctx, dst, dstCapa= city, src, srcSize, is_streaming); dctx->expected =3D 0; /* Streaming not supported */ break; case bt_raw : @@ -1250,6 +1340,7 @@ size_t ZSTD_decompressContinue(ZSTD_DCtx* dctx, void*= dst, size_t dstCapacity, c case ZSTDds_decodeSkippableHeader: assert(src !=3D NULL); assert(srcSize <=3D ZSTD_SKIPPABLEHEADERSIZE); + assert(dctx->format !=3D ZSTD_f_zstd1_magicless); ZSTD_memcpy(dctx->headerBuffer + (ZSTD_SKIPPABLEHEADERSIZE - srcSi= ze), src, srcSize); /* complete skippable header */ dctx->expected =3D MEM_readLE32(dctx->headerBuffer + ZSTD_FRAMEIDS= IZE); /* note : dctx->expected can grow seriously large, beyond local buf= fer size */ dctx->stage =3D ZSTDds_skipFrame; @@ -1262,7 +1353,7 @@ size_t ZSTD_decompressContinue(ZSTD_DCtx* dctx, void*= dst, size_t dstCapacity, c =20 default: assert(0); /* impossible */ - RETURN_ERROR(GENERIC, "impossible to reach"); /* some compiler r= equire default to do something */ + RETURN_ERROR(GENERIC, "impossible to reach"); /* some compilers = require default to do something */ } } =20 @@ -1303,11 +1394,11 @@ ZSTD_loadDEntropy(ZSTD_entropyDTables_t* entropy, /* in minimal huffman, we always use X1 variants */ size_t const hSize =3D HUF_readDTableX1_wksp(entropy->hufTable, dictPtr, dictEnd - dictPtr, - workspace, workspaceSize); + workspace, workspaceSize, = /* flags */ 0); #else size_t const hSize =3D HUF_readDTableX2_wksp(entropy->hufTable, dictPtr, (size_t)(dictEnd = - dictPtr), - workspace, workspaceSize); + workspace, workspaceSize, = /* flags */ 0); #endif RETURN_ERROR_IF(HUF_isError(hSize), dictionary_corrupted, ""); dictPtr +=3D hSize; @@ -1403,10 +1494,11 @@ size_t ZSTD_decompressBegin(ZSTD_DCtx* dctx) dctx->prefixStart =3D NULL; dctx->virtualStart =3D NULL; dctx->dictEnd =3D NULL; - dctx->entropy.hufTable[0] =3D (HUF_DTable)((HufLog)*0x1000001); /* co= ver both little and big endian */ + dctx->entropy.hufTable[0] =3D (HUF_DTable)((ZSTD_HUFFDTABLE_CAPACITY_L= OG)*0x1000001); /* cover both little and big endian */ dctx->litEntropy =3D dctx->fseEntropy =3D 0; dctx->dictID =3D 0; dctx->bType =3D bt_reserved; + dctx->isFrameDecompression =3D 1; ZSTD_STATIC_ASSERT(sizeof(dctx->entropy.rep) =3D=3D sizeof(repStartVal= ue)); ZSTD_memcpy(dctx->entropy.rep, repStartValue, sizeof(repStartValue)); = /* initial repcodes */ dctx->LLTptr =3D dctx->entropy.LLTable; @@ -1465,7 +1557,7 @@ unsigned ZSTD_getDictID_fromDict(const void* dict, si= ze_t dictSize) * This could for one of the following reasons : * - The frame does not require a dictionary (most common case). * - The frame was built with dictID intentionally removed. - * Needed dictionary is a hidden information. + * Needed dictionary is a hidden piece of information. * Note : this use case also happens when using a non-conformant dictio= nary. * - `srcSize` is too small, and as a result, frame header could not be d= ecoded. * Note : possible if `srcSize < ZSTD_FRAMEHEADERSIZE_MAX`. @@ -1474,7 +1566,7 @@ unsigned ZSTD_getDictID_fromDict(const void* dict, si= ze_t dictSize) * ZSTD_getFrameHeader(), which will provide a more precise error code. */ unsigned ZSTD_getDictID_fromFrame(const void* src, size_t srcSize) { - ZSTD_frameHeader zfp =3D { 0, 0, 0, ZSTD_frame, 0, 0, 0 }; + ZSTD_FrameHeader zfp =3D { 0, 0, 0, ZSTD_frame, 0, 0, 0, 0, 0 }; size_t const hError =3D ZSTD_getFrameHeader(&zfp, src, srcSize); if (ZSTD_isError(hError)) return 0; return zfp.dictID; @@ -1581,7 +1673,9 @@ size_t ZSTD_initDStream_usingDict(ZSTD_DStream* zds, = const void* dict, size_t di size_t ZSTD_initDStream(ZSTD_DStream* zds) { DEBUGLOG(4, "ZSTD_initDStream"); - return ZSTD_initDStream_usingDDict(zds, NULL); + FORWARD_IF_ERROR(ZSTD_DCtx_reset(zds, ZSTD_reset_session_only), ""); + FORWARD_IF_ERROR(ZSTD_DCtx_refDDict(zds, NULL), ""); + return ZSTD_startingInputLength(zds->format); } =20 /* ZSTD_initDStream_usingDDict() : @@ -1589,6 +1683,7 @@ size_t ZSTD_initDStream(ZSTD_DStream* zds) * this function cannot fail */ size_t ZSTD_initDStream_usingDDict(ZSTD_DStream* dctx, const ZSTD_DDict* d= dict) { + DEBUGLOG(4, "ZSTD_initDStream_usingDDict"); FORWARD_IF_ERROR( ZSTD_DCtx_reset(dctx, ZSTD_reset_session_only) , ""); FORWARD_IF_ERROR( ZSTD_DCtx_refDDict(dctx, ddict) , ""); return ZSTD_startingInputLength(dctx->format); @@ -1599,6 +1694,7 @@ size_t ZSTD_initDStream_usingDDict(ZSTD_DStream* dctx= , const ZSTD_DDict* ddict) * this function cannot fail */ size_t ZSTD_resetDStream(ZSTD_DStream* dctx) { + DEBUGLOG(4, "ZSTD_resetDStream"); FORWARD_IF_ERROR(ZSTD_DCtx_reset(dctx, ZSTD_reset_session_only), ""); return ZSTD_startingInputLength(dctx->format); } @@ -1670,6 +1766,15 @@ ZSTD_bounds ZSTD_dParam_getBounds(ZSTD_dParameter dP= aram) bounds.lowerBound =3D (int)ZSTD_rmd_refSingleDDict; bounds.upperBound =3D (int)ZSTD_rmd_refMultipleDDicts; return bounds; + case ZSTD_d_disableHuffmanAssembly: + bounds.lowerBound =3D 0; + bounds.upperBound =3D 1; + return bounds; + case ZSTD_d_maxBlockSize: + bounds.lowerBound =3D ZSTD_BLOCKSIZE_MAX_MIN; + bounds.upperBound =3D ZSTD_BLOCKSIZE_MAX; + return bounds; + default:; } bounds.error =3D ERROR(parameter_unsupported); @@ -1710,6 +1815,12 @@ size_t ZSTD_DCtx_getParameter(ZSTD_DCtx* dctx, ZSTD_= dParameter param, int* value case ZSTD_d_refMultipleDDicts: *value =3D (int)dctx->refMultipleDDicts; return 0; + case ZSTD_d_disableHuffmanAssembly: + *value =3D (int)dctx->disableHufAsm; + return 0; + case ZSTD_d_maxBlockSize: + *value =3D dctx->maxBlockSizeParam; + return 0; default:; } RETURN_ERROR(parameter_unsupported, ""); @@ -1743,6 +1854,14 @@ size_t ZSTD_DCtx_setParameter(ZSTD_DCtx* dctx, ZSTD_= dParameter dParam, int value } dctx->refMultipleDDicts =3D (ZSTD_refMultipleDDicts_e)value; return 0; + case ZSTD_d_disableHuffmanAssembly: + CHECK_DBOUNDS(ZSTD_d_disableHuffmanAssembly, value); + dctx->disableHufAsm =3D value !=3D 0; + return 0; + case ZSTD_d_maxBlockSize: + if (value !=3D 0) CHECK_DBOUNDS(ZSTD_d_maxBlockSize, value); + dctx->maxBlockSizeParam =3D value; + return 0; default:; } RETURN_ERROR(parameter_unsupported, ""); @@ -1754,6 +1873,7 @@ size_t ZSTD_DCtx_reset(ZSTD_DCtx* dctx, ZSTD_ResetDir= ective reset) || (reset =3D=3D ZSTD_reset_session_and_parameters) ) { dctx->streamStage =3D zdss_init; dctx->noForwardProgress =3D 0; + dctx->isFrameDecompression =3D 1; } if ( (reset =3D=3D ZSTD_reset_parameters) || (reset =3D=3D ZSTD_reset_session_and_parameters) ) { @@ -1770,11 +1890,17 @@ size_t ZSTD_sizeof_DStream(const ZSTD_DStream* dctx) return ZSTD_sizeof_DCtx(dctx); } =20 -size_t ZSTD_decodingBufferSize_min(unsigned long long windowSize, unsigned= long long frameContentSize) +static size_t ZSTD_decodingBufferSize_internal(unsigned long long windowSi= ze, unsigned long long frameContentSize, size_t blockSizeMax) { - size_t const blockSize =3D (size_t) MIN(windowSize, ZSTD_BLOCKSIZE_MAX= ); - /* space is needed to store the litbuffer after the output of a given = block without stomping the extDict of a previous run, as well as to cover b= oth windows against wildcopy*/ - unsigned long long const neededRBSize =3D windowSize + blockSize + ZST= D_BLOCKSIZE_MAX + (WILDCOPY_OVERLENGTH * 2); + size_t const blockSize =3D MIN((size_t)MIN(windowSize, ZSTD_BLOCKSIZE_= MAX), blockSizeMax); + /* We need blockSize + WILDCOPY_OVERLENGTH worth of buffer so that if = a block + * ends at windowSize + WILDCOPY_OVERLENGTH + 1 bytes, we can start wr= iting + * the block at the beginning of the output buffer, and maintain a ful= l window. + * + * We need another blockSize worth of buffer so that we can store split + * literals at the end of the block without overwriting the extDict wi= ndow. + */ + unsigned long long const neededRBSize =3D windowSize + (blockSize * 2)= + (WILDCOPY_OVERLENGTH * 2); unsigned long long const neededSize =3D MIN(frameContentSize, neededRB= Size); size_t const minRBSize =3D (size_t) neededSize; RETURN_ERROR_IF((unsigned long long)minRBSize !=3D neededSize, @@ -1782,6 +1908,11 @@ size_t ZSTD_decodingBufferSize_min(unsigned long lon= g windowSize, unsigned long return minRBSize; } =20 +size_t ZSTD_decodingBufferSize_min(unsigned long long windowSize, unsigned= long long frameContentSize) +{ + return ZSTD_decodingBufferSize_internal(windowSize, frameContentSize, = ZSTD_BLOCKSIZE_MAX); +} + size_t ZSTD_estimateDStreamSize(size_t windowSize) { size_t const blockSize =3D MIN(windowSize, ZSTD_BLOCKSIZE_MAX); @@ -1793,7 +1924,7 @@ size_t ZSTD_estimateDStreamSize(size_t windowSize) size_t ZSTD_estimateDStreamSize_fromFrame(const void* src, size_t srcSize) { U32 const windowSizeMax =3D 1U << ZSTD_WINDOWLOG_MAX; /* note : shou= ld be user-selectable, but requires an additional parameter (or a dctx) */ - ZSTD_frameHeader zfh; + ZSTD_FrameHeader zfh; size_t const err =3D ZSTD_getFrameHeader(&zfh, src, srcSize); if (ZSTD_isError(err)) return err; RETURN_ERROR_IF(err>0, srcSize_wrong, ""); @@ -1888,6 +2019,7 @@ size_t ZSTD_decompressStream(ZSTD_DStream* zds, ZSTD_= outBuffer* output, ZSTD_inB U32 someMoreWork =3D 1; =20 DEBUGLOG(5, "ZSTD_decompressStream"); + assert(zds !=3D NULL); RETURN_ERROR_IF( input->pos > input->size, srcSize_wrong, @@ -1918,7 +2050,6 @@ size_t ZSTD_decompressStream(ZSTD_DStream* zds, ZSTD_= outBuffer* output, ZSTD_inB if (zds->refMultipleDDicts && zds->ddictSet) { ZSTD_DCtx_selectFrameDDict(zds); } - DEBUGLOG(5, "header size : %u", (U32)hSize); if (ZSTD_isError(hSize)) { return hSize; /* error */ } @@ -1932,6 +2063,11 @@ size_t ZSTD_decompressStream(ZSTD_DStream* zds, ZSTD= _outBuffer* output, ZSTD_inB zds->lhSize +=3D remainingInput; } input->pos =3D input->size; + /* check first few bytes */ + FORWARD_IF_ERROR( + ZSTD_getFrameHeader_advanced(&zds->fParams, zd= s->headerBuffer, zds->lhSize, zds->format), + "First few bytes detected incorrect" ); + /* return hint input size */ return (MAX((size_t)ZSTD_FRAMEHEADERSIZE_MIN(zds->= format), hSize) - zds->lhSize) + ZSTD_blockHeaderSize; /* remaining heade= r bytes + next block header */ } assert(ip !=3D NULL); @@ -1943,14 +2079,15 @@ size_t ZSTD_decompressStream(ZSTD_DStream* zds, ZST= D_outBuffer* output, ZSTD_inB if (zds->fParams.frameContentSize !=3D ZSTD_CONTENTSIZE_UNKNOWN && zds->fParams.frameType !=3D ZSTD_skippableFrame && (U64)(size_t)(oend-op) >=3D zds->fParams.frameContentSi= ze) { - size_t const cSize =3D ZSTD_findFrameCompressedSize(istart= , (size_t)(iend-istart)); + size_t const cSize =3D ZSTD_findFrameCompressedSize_advanc= ed(istart, (size_t)(iend-istart), zds->format); if (cSize <=3D (size_t)(iend-istart)) { /* shortcut : using single-pass mode */ size_t const decompressedSize =3D ZSTD_decompress_usin= gDDict(zds, op, (size_t)(oend-op), istart, cSize, ZSTD_getDDict(zds)); if (ZSTD_isError(decompressedSize)) return decompresse= dSize; - DEBUGLOG(4, "shortcut to single-pass ZSTD_decompress_u= singDDict()") + DEBUGLOG(4, "shortcut to single-pass ZSTD_decompress_u= singDDict()"); + assert(istart !=3D NULL); ip =3D istart + cSize; - op +=3D decompressedSize; + op =3D op ? op + decompressedSize : op; /* can occur i= f frameContentSize =3D 0 (empty frame) */ zds->expected =3D 0; zds->streamStage =3D zdss_init; someMoreWork =3D 0; @@ -1969,7 +2106,8 @@ size_t ZSTD_decompressStream(ZSTD_DStream* zds, ZSTD_= outBuffer* output, ZSTD_inB DEBUGLOG(4, "Consume header"); FORWARD_IF_ERROR(ZSTD_decompressBegin_usingDDict(zds, ZSTD_get= DDict(zds)), ""); =20 - if ((MEM_readLE32(zds->headerBuffer) & ZSTD_MAGIC_SKIPPABLE_MA= SK) =3D=3D ZSTD_MAGIC_SKIPPABLE_START) { /* skippable frame */ + if (zds->format =3D=3D ZSTD_f_zstd1 + && (MEM_readLE32(zds->headerBuffer) & ZSTD_MAGIC_SKIPPABLE= _MASK) =3D=3D ZSTD_MAGIC_SKIPPABLE_START) { /* skippable frame */ zds->expected =3D MEM_readLE32(zds->headerBuffer + ZSTD_FR= AMEIDSIZE); zds->stage =3D ZSTDds_skipFrame; } else { @@ -1985,11 +2123,13 @@ size_t ZSTD_decompressStream(ZSTD_DStream* zds, ZST= D_outBuffer* output, ZSTD_inB zds->fParams.windowSize =3D MAX(zds->fParams.windowSize, 1U <<= ZSTD_WINDOWLOG_ABSOLUTEMIN); RETURN_ERROR_IF(zds->fParams.windowSize > zds->maxWindowSize, frameParameter_windowTooLarge, ""); + if (zds->maxBlockSizeParam !=3D 0) + zds->fParams.blockSizeMax =3D MIN(zds->fParams.blockSizeMa= x, (unsigned)zds->maxBlockSizeParam); =20 /* Adapt buffer sizes to frame header instructions */ { size_t const neededInBuffSize =3D MAX(zds->fParams.blockSi= zeMax, 4 /* frame checksum */); size_t const neededOutBuffSize =3D zds->outBufferMode =3D= =3D ZSTD_bm_buffered - ? ZSTD_decodingBufferSize_min(zds->fParams.windowS= ize, zds->fParams.frameContentSize) + ? ZSTD_decodingBufferSize_internal(zds->fParams.wi= ndowSize, zds->fParams.frameContentSize, zds->fParams.blockSizeMax) : 0; =20 ZSTD_DCtx_updateOversizedDuration(zds, neededInBuffSize, n= eededOutBuffSize); @@ -2034,6 +2174,7 @@ size_t ZSTD_decompressStream(ZSTD_DStream* zds, ZSTD_= outBuffer* output, ZSTD_inB } if ((size_t)(iend-ip) >=3D neededInSize) { /* decode dire= ctly from src */ FORWARD_IF_ERROR(ZSTD_decompressContinueStream(zds, &o= p, oend, ip, neededInSize), ""); + assert(ip !=3D NULL); ip +=3D neededInSize; /* Function modifies the stage so we must break */ break; @@ -2048,7 +2189,7 @@ size_t ZSTD_decompressStream(ZSTD_DStream* zds, ZSTD_= outBuffer* output, ZSTD_inB int const isSkipFrame =3D ZSTD_isSkipFrame(zds); size_t loadedSize; /* At this point we shouldn't be decompressing a block tha= t we can stream. */ - assert(neededInSize =3D=3D ZSTD_nextSrcSizeToDecompressWit= hInputSize(zds, iend - ip)); + assert(neededInSize =3D=3D ZSTD_nextSrcSizeToDecompressWit= hInputSize(zds, (size_t)(iend - ip))); if (isSkipFrame) { loadedSize =3D MIN(toLoad, (size_t)(iend-ip)); } else { @@ -2057,8 +2198,11 @@ size_t ZSTD_decompressStream(ZSTD_DStream* zds, ZSTD= _outBuffer* output, ZSTD_inB "should never happen"); loadedSize =3D ZSTD_limitCopy(zds->inBuff + zds->inPos= , toLoad, ip, (size_t)(iend-ip)); } - ip +=3D loadedSize; - zds->inPos +=3D loadedSize; + if (loadedSize !=3D 0) { + /* ip may be NULL */ + ip +=3D loadedSize; + zds->inPos +=3D loadedSize; + } if (loadedSize < toLoad) { someMoreWork =3D 0; break; } = /* not enough input, wait for more */ =20 /* decode loaded input */ @@ -2068,14 +2212,17 @@ size_t ZSTD_decompressStream(ZSTD_DStream* zds, ZST= D_outBuffer* output, ZSTD_inB break; } case zdss_flush: - { size_t const toFlushSize =3D zds->outEnd - zds->outStart; + { + size_t const toFlushSize =3D zds->outEnd - zds->outStart; size_t const flushedSize =3D ZSTD_limitCopy(op, (size_t)(o= end-op), zds->outBuff + zds->outStart, toFlushSize); - op +=3D flushedSize; + + op =3D op ? op + flushedSize : op; + zds->outStart +=3D flushedSize; if (flushedSize =3D=3D toFlushSize) { /* flush completed = */ zds->streamStage =3D zdss_read; if ( (zds->outBuffSize < zds->fParams.frameContentSize) - && (zds->outStart + zds->fParams.blockSizeMax > zds-= >outBuffSize) ) { + && (zds->outStart + zds->fParams.blockSizeMax > zd= s->outBuffSize) ) { DEBUGLOG(5, "restart filling outBuff from beginnin= g (left:%i, needed:%u)", (int)(zds->outBuffSize - zds->outStart), (U32)zds->fParams.blockSizeMax); @@ -2089,7 +2236,7 @@ size_t ZSTD_decompressStream(ZSTD_DStream* zds, ZSTD_= outBuffer* output, ZSTD_inB =20 default: assert(0); /* impossible */ - RETURN_ERROR(GENERIC, "impossible to reach"); /* some compil= er require default to do something */ + RETURN_ERROR(GENERIC, "impossible to reach"); /* some compil= ers require default to do something */ } } =20 /* result */ @@ -2102,8 +2249,8 @@ size_t ZSTD_decompressStream(ZSTD_DStream* zds, ZSTD_= outBuffer* output, ZSTD_inB if ((ip=3D=3Distart) && (op=3D=3Dostart)) { /* no forward progress */ zds->noForwardProgress ++; if (zds->noForwardProgress >=3D ZSTD_NO_FORWARD_PROGRESS_MAX) { - RETURN_ERROR_IF(op=3D=3Doend, dstSize_tooSmall, ""); - RETURN_ERROR_IF(ip=3D=3Diend, srcSize_wrong, ""); + RETURN_ERROR_IF(op=3D=3Doend, noForwardProgress_destFull, ""); + RETURN_ERROR_IF(ip=3D=3Diend, noForwardProgress_inputEmpty, ""= ); assert(0); } } else { @@ -2140,11 +2287,17 @@ size_t ZSTD_decompressStream_simpleArgs ( void* dst, size_t dstCapacity, size_t* dstPos, const void* src, size_t srcSize, size_t* srcPos) { - ZSTD_outBuffer output =3D { dst, dstCapacity, *dstPos }; - ZSTD_inBuffer input =3D { src, srcSize, *srcPos }; - /* ZSTD_compress_generic() will check validity of dstPos and srcPos */ - size_t const cErr =3D ZSTD_decompressStream(dctx, &output, &input); - *dstPos =3D output.pos; - *srcPos =3D input.pos; - return cErr; + ZSTD_outBuffer output; + ZSTD_inBuffer input; + output.dst =3D dst; + output.size =3D dstCapacity; + output.pos =3D *dstPos; + input.src =3D src; + input.size =3D srcSize; + input.pos =3D *srcPos; + { size_t const cErr =3D ZSTD_decompressStream(dctx, &output, &input); + *dstPos =3D output.pos; + *srcPos =3D input.pos; + return cErr; + } } diff --git a/lib/zstd/decompress/zstd_decompress_block.c b/lib/zstd/decompr= ess/zstd_decompress_block.c index c1913b8e7c89..710eb0ffd5a3 100644 --- a/lib/zstd/decompress/zstd_decompress_block.c +++ b/lib/zstd/decompress/zstd_decompress_block.c @@ -1,5 +1,6 @@ +// SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -20,12 +21,12 @@ #include "../common/mem.h" /* low level memory routines */ #define FSE_STATIC_LINKING_ONLY #include "../common/fse.h" -#define HUF_STATIC_LINKING_ONLY #include "../common/huf.h" #include "../common/zstd_internal.h" #include "zstd_decompress_internal.h" /* ZSTD_DCtx */ #include "zstd_ddict.h" /* ZSTD_DDictDictContent */ #include "zstd_decompress_block.h" +#include "../common/bits.h" /* ZSTD_highbit32 */ =20 /*_******************************************************* * Macros @@ -51,6 +52,13 @@ static void ZSTD_copy4(void* dst, const void* src) { ZST= D_memcpy(dst, src, 4); } * Block decoding ***************************************************************/ =20 +static size_t ZSTD_blockSizeMax(ZSTD_DCtx const* dctx) +{ + size_t const blockSizeMax =3D dctx->isFrameDecompression ? dctx->fPara= ms.blockSizeMax : ZSTD_BLOCKSIZE_MAX; + assert(blockSizeMax <=3D ZSTD_BLOCKSIZE_MAX); + return blockSizeMax; +} + /*! ZSTD_getcBlockSize() : * Provides the size of compressed block from block header `src` */ size_t ZSTD_getcBlockSize(const void* src, size_t srcSize, @@ -73,41 +81,49 @@ size_t ZSTD_getcBlockSize(const void* src, size_t srcSi= ze, static void ZSTD_allocateLiteralsBuffer(ZSTD_DCtx* dctx, void* const dst, = const size_t dstCapacity, const size_t litSize, const streaming_operation streaming, const size_t expectedWriteSize, c= onst unsigned splitImmediately) { - if (streaming =3D=3D not_streaming && dstCapacity > ZSTD_BLOCKSIZE_MAX= + WILDCOPY_OVERLENGTH + litSize + WILDCOPY_OVERLENGTH) - { - /* room for litbuffer to fit without read faulting */ - dctx->litBuffer =3D (BYTE*)dst + ZSTD_BLOCKSIZE_MAX + WILDCOPY_OVE= RLENGTH; + size_t const blockSizeMax =3D ZSTD_blockSizeMax(dctx); + assert(litSize <=3D blockSizeMax); + assert(dctx->isFrameDecompression || streaming =3D=3D not_streaming); + assert(expectedWriteSize <=3D blockSizeMax); + if (streaming =3D=3D not_streaming && dstCapacity > blockSizeMax + WIL= DCOPY_OVERLENGTH + litSize + WILDCOPY_OVERLENGTH) { + /* If we aren't streaming, we can just put the literals after the = output + * of the current block. We don't need to worry about overwriting = the + * extDict of our window, because it doesn't exist. + * So if we have space after the end of the block, just put it the= re. + */ + dctx->litBuffer =3D (BYTE*)dst + blockSizeMax + WILDCOPY_OVERLENGT= H; dctx->litBufferEnd =3D dctx->litBuffer + litSize; dctx->litBufferLocation =3D ZSTD_in_dst; - } - else if (litSize > ZSTD_LITBUFFEREXTRASIZE) - { - /* won't fit in litExtraBuffer, so it will be split between end of= dst and extra buffer */ + } else if (litSize <=3D ZSTD_LITBUFFEREXTRASIZE) { + /* Literals fit entirely within the extra buffer, put them there t= o avoid + * having to split the literals. + */ + dctx->litBuffer =3D dctx->litExtraBuffer; + dctx->litBufferEnd =3D dctx->litBuffer + litSize; + dctx->litBufferLocation =3D ZSTD_not_in_dst; + } else { + assert(blockSizeMax > ZSTD_LITBUFFEREXTRASIZE); + /* Literals must be split between the output block and the extra l= it + * buffer. We fill the extra lit buffer with the tail of the liter= als, + * and put the rest of the literals at the end of the block, with + * WILDCOPY_OVERLENGTH of buffer room to allow for overreads. + * This MUST not write more than our maxBlockSize beyond dst, beca= use in + * streaming mode, that could overwrite part of our extDict window. + */ if (splitImmediately) { /* won't fit in litExtraBuffer, so it will be split between en= d of dst and extra buffer */ dctx->litBuffer =3D (BYTE*)dst + expectedWriteSize - litSize += ZSTD_LITBUFFEREXTRASIZE - WILDCOPY_OVERLENGTH; dctx->litBufferEnd =3D dctx->litBuffer + litSize - ZSTD_LITBUF= FEREXTRASIZE; - } - else { - /* initially this will be stored entirely in dst during huffma= n decoding, it will partially shifted to litExtraBuffer after */ + } else { + /* initially this will be stored entirely in dst during huffma= n decoding, it will partially be shifted to litExtraBuffer after */ dctx->litBuffer =3D (BYTE*)dst + expectedWriteSize - litSize; dctx->litBufferEnd =3D (BYTE*)dst + expectedWriteSize; } dctx->litBufferLocation =3D ZSTD_split; - } - else - { - /* fits entirely within litExtraBuffer, so no split is necessary */ - dctx->litBuffer =3D dctx->litExtraBuffer; - dctx->litBufferEnd =3D dctx->litBuffer + litSize; - dctx->litBufferLocation =3D ZSTD_not_in_dst; + assert(dctx->litBufferEnd <=3D (BYTE*)dst + expectedWriteSize); } } =20 -/* Hidden declaration for fullbench */ -size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx, - const void* src, size_t srcSize, - void* dst, size_t dstCapacity, const streaming_o= peration streaming); /*! ZSTD_decodeLiteralsBlock() : * Where it is possible to do so without being stomped by the output durin= g decompression, the literals block will be stored * in the dstBuffer. If there is room to do so, it will be stored in full= in the excess dst space after where the current @@ -116,7 +132,7 @@ size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx, * * @return : nb of bytes read from src (< srcSize ) * note : symbol not declared but exposed for fullbench */ -size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx, +static size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx, const void* src, size_t srcSize, /* note : src= Size < BLOCKSIZE */ void* dst, size_t dstCapacity, const streaming_o= peration streaming) { @@ -124,7 +140,8 @@ size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx, RETURN_ERROR_IF(srcSize < MIN_CBLOCK_SIZE, corruption_detected, ""); =20 { const BYTE* const istart =3D (const BYTE*) src; - symbolEncodingType_e const litEncType =3D (symbolEncodingType_e)(i= start[0] & 3); + SymbolEncodingType_e const litEncType =3D (SymbolEncodingType_e)(i= start[0] & 3); + size_t const blockSizeMax =3D ZSTD_blockSizeMax(dctx); =20 switch(litEncType) { @@ -134,13 +151,16 @@ size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx, ZSTD_FALLTHROUGH; =20 case set_compressed: - RETURN_ERROR_IF(srcSize < 5, corruption_detected, "srcSize >= =3D MIN_CBLOCK_SIZE =3D=3D 3; here we need up to 5 for case 3"); + RETURN_ERROR_IF(srcSize < 5, corruption_detected, "srcSize >= =3D MIN_CBLOCK_SIZE =3D=3D 2; here we need up to 5 for case 3"); { size_t lhSize, litSize, litCSize; U32 singleStream=3D0; U32 const lhlCode =3D (istart[0] >> 2) & 3; U32 const lhc =3D MEM_readLE32(istart); size_t hufSuccess; - size_t expectedWriteSize =3D MIN(ZSTD_BLOCKSIZE_MAX, dstCa= pacity); + size_t expectedWriteSize =3D MIN(blockSizeMax, dstCapacity= ); + int const flags =3D 0 + | (ZSTD_DCtx_get_bmi2(dctx) ? HUF_flags_bmi2 : 0) + | (dctx->disableHufAsm ? HUF_flags_disableAsm : 0); switch(lhlCode) { case 0: case 1: default: /* note : default is impossible= , since lhlCode into [0..3] */ @@ -164,7 +184,11 @@ size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx, break; } RETURN_ERROR_IF(litSize > 0 && dst =3D=3D NULL, dstSize_to= oSmall, "NULL not handled"); - RETURN_ERROR_IF(litSize > ZSTD_BLOCKSIZE_MAX, corruption_d= etected, ""); + RETURN_ERROR_IF(litSize > blockSizeMax, corruption_detecte= d, ""); + if (!singleStream) + RETURN_ERROR_IF(litSize < MIN_LITERALS_FOR_4_STREAMS, = literals_headerWrong, + "Not enough literals (%zu) for the 4-streams mode = (min %u)", + litSize, MIN_LITERALS_FOR_4_STREAMS); RETURN_ERROR_IF(litCSize + lhSize > srcSize, corruption_de= tected, ""); RETURN_ERROR_IF(expectedWriteSize < litSize , dstSize_tooS= mall, ""); ZSTD_allocateLiteralsBuffer(dctx, dst, dstCapacity, litSiz= e, streaming, expectedWriteSize, 0); @@ -176,13 +200,14 @@ size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx, =20 if (litEncType=3D=3Dset_repeat) { if (singleStream) { - hufSuccess =3D HUF_decompress1X_usingDTable_bmi2( + hufSuccess =3D HUF_decompress1X_usingDTable( dctx->litBuffer, litSize, istart+lhSize, litCS= ize, - dctx->HUFptr, ZSTD_DCtx_get_bmi2(dctx)); + dctx->HUFptr, flags); } else { - hufSuccess =3D HUF_decompress4X_usingDTable_bmi2( + assert(litSize >=3D MIN_LITERALS_FOR_4_STREAMS); + hufSuccess =3D HUF_decompress4X_usingDTable( dctx->litBuffer, litSize, istart+lhSize, litCS= ize, - dctx->HUFptr, ZSTD_DCtx_get_bmi2(dctx)); + dctx->HUFptr, flags); } } else { if (singleStream) { @@ -190,26 +215,28 @@ size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx, hufSuccess =3D HUF_decompress1X_DCtx_wksp( dctx->entropy.hufTable, dctx->litBuffer, litSi= ze, istart+lhSize, litCSize, dctx->workspace, - sizeof(dctx->workspace)); + sizeof(dctx->workspace), flags); #else - hufSuccess =3D HUF_decompress1X1_DCtx_wksp_bmi2( + hufSuccess =3D HUF_decompress1X1_DCtx_wksp( dctx->entropy.hufTable, dctx->litBuffer, litSi= ze, istart+lhSize, litCSize, dctx->workspace, - sizeof(dctx->workspace), ZSTD_DCtx_get_bmi2(dc= tx)); + sizeof(dctx->workspace), flags); #endif } else { - hufSuccess =3D HUF_decompress4X_hufOnly_wksp_bmi2( + hufSuccess =3D HUF_decompress4X_hufOnly_wksp( dctx->entropy.hufTable, dctx->litBuffer, litSi= ze, istart+lhSize, litCSize, dctx->workspace, - sizeof(dctx->workspace), ZSTD_DCtx_get_bmi2(dc= tx)); + sizeof(dctx->workspace), flags); } } if (dctx->litBufferLocation =3D=3D ZSTD_split) { + assert(litSize > ZSTD_LITBUFFEREXTRASIZE); ZSTD_memcpy(dctx->litExtraBuffer, dctx->litBufferEnd -= ZSTD_LITBUFFEREXTRASIZE, ZSTD_LITBUFFEREXTRASIZE); ZSTD_memmove(dctx->litBuffer + ZSTD_LITBUFFEREXTRASIZE= - WILDCOPY_OVERLENGTH, dctx->litBuffer, litSize - ZSTD_LITBUFFEREXTRASIZE); dctx->litBuffer +=3D ZSTD_LITBUFFEREXTRASIZE - WILDCOP= Y_OVERLENGTH; dctx->litBufferEnd -=3D WILDCOPY_OVERLENGTH; + assert(dctx->litBufferEnd <=3D (BYTE*)dst + blockSizeM= ax); } =20 RETURN_ERROR_IF(HUF_isError(hufSuccess), corruption_detect= ed, ""); @@ -224,7 +251,7 @@ size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx, case set_basic: { size_t litSize, lhSize; U32 const lhlCode =3D ((istart[0]) >> 2) & 3; - size_t expectedWriteSize =3D MIN(ZSTD_BLOCKSIZE_MAX, dstCa= pacity); + size_t expectedWriteSize =3D MIN(blockSizeMax, dstCapacity= ); switch(lhlCode) { case 0: case 2: default: /* note : default is impossible= , since lhlCode into [0..3] */ @@ -237,11 +264,13 @@ size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx, break; case 3: lhSize =3D 3; + RETURN_ERROR_IF(srcSize<3, corruption_detected, "srcSi= ze >=3D MIN_CBLOCK_SIZE =3D=3D 2; here we need lhSize =3D 3"); litSize =3D MEM_readLE24(istart) >> 4; break; } =20 RETURN_ERROR_IF(litSize > 0 && dst =3D=3D NULL, dstSize_to= oSmall, "NULL not handled"); + RETURN_ERROR_IF(litSize > blockSizeMax, corruption_detecte= d, ""); RETURN_ERROR_IF(expectedWriteSize < litSize, dstSize_tooSm= all, ""); ZSTD_allocateLiteralsBuffer(dctx, dst, dstCapacity, litSiz= e, streaming, expectedWriteSize, 1); if (lhSize+litSize+WILDCOPY_OVERLENGTH > srcSize) { /* ri= sk reading beyond src buffer with wildcopy */ @@ -270,7 +299,7 @@ size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx, case set_rle: { U32 const lhlCode =3D ((istart[0]) >> 2) & 3; size_t litSize, lhSize; - size_t expectedWriteSize =3D MIN(ZSTD_BLOCKSIZE_MAX, dstCa= pacity); + size_t expectedWriteSize =3D MIN(blockSizeMax, dstCapacity= ); switch(lhlCode) { case 0: case 2: default: /* note : default is impossible= , since lhlCode into [0..3] */ @@ -279,16 +308,17 @@ size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx, break; case 1: lhSize =3D 2; + RETURN_ERROR_IF(srcSize<3, corruption_detected, "srcSi= ze >=3D MIN_CBLOCK_SIZE =3D=3D 2; here we need lhSize+1 =3D 3"); litSize =3D MEM_readLE16(istart) >> 4; break; case 3: lhSize =3D 3; + RETURN_ERROR_IF(srcSize<4, corruption_detected, "srcSi= ze >=3D MIN_CBLOCK_SIZE =3D=3D 2; here we need lhSize+1 =3D 4"); litSize =3D MEM_readLE24(istart) >> 4; - RETURN_ERROR_IF(srcSize<4, corruption_detected, "srcSi= ze >=3D MIN_CBLOCK_SIZE =3D=3D 3; here we need lhSize+1 =3D 4"); break; } RETURN_ERROR_IF(litSize > 0 && dst =3D=3D NULL, dstSize_to= oSmall, "NULL not handled"); - RETURN_ERROR_IF(litSize > ZSTD_BLOCKSIZE_MAX, corruption_d= etected, ""); + RETURN_ERROR_IF(litSize > blockSizeMax, corruption_detecte= d, ""); RETURN_ERROR_IF(expectedWriteSize < litSize, dstSize_tooSm= all, ""); ZSTD_allocateLiteralsBuffer(dctx, dst, dstCapacity, litSiz= e, streaming, expectedWriteSize, 1); if (dctx->litBufferLocation =3D=3D ZSTD_split) @@ -310,6 +340,18 @@ size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx, } } =20 +/* Hidden declaration for fullbench */ +size_t ZSTD_decodeLiteralsBlock_wrapper(ZSTD_DCtx* dctx, + const void* src, size_t srcSize, + void* dst, size_t dstCapacity); +size_t ZSTD_decodeLiteralsBlock_wrapper(ZSTD_DCtx* dctx, + const void* src, size_t srcSize, + void* dst, size_t dstCapacity) +{ + dctx->isFrameDecompression =3D 0; + return ZSTD_decodeLiteralsBlock(dctx, src, srcSize, dst, dstCapacity, = not_streaming); +} + /* Default FSE distribution tables. * These are pre-calculated FSE decoding tables using default distribution= s as defined in specification : * https://github.com/facebook/zstd/blob/release/doc/zstd_compression_form= at.md#default-distributions @@ -317,7 +359,7 @@ size_t ZSTD_decodeLiteralsBlock(ZSTD_DCtx* dctx, * - start from default distributions, present in /lib/common/zstd_interna= l.h * - generate tables normally, using ZSTD_buildFSETable() * - printout the content of tables - * - pretify output, report below, test with fuzzer to ensure it's correct= */ + * - prettify output, report below, test with fuzzer to ensure it's correc= t */ =20 /* Default FSE distribution table for Literal Lengths */ static const ZSTD_seqSymbol LL_defaultDTable[(1<=3D0); + pos +=3D (size_t)n; } } /* Now we spread those positions across the table. - * The benefit of doing it in two stages is that we avoid the the + * The benefit of doing it in two stages is that we avoid the * variable size inner loop, which caused lots of branch misses. * Now we can run through all the positions without any branch mis= ses. - * We unroll the loop twice, since that is what emperically worked= best. + * We unroll the loop twice, since that is what empirically worked= best. */ { size_t position =3D 0; @@ -540,7 +583,7 @@ void ZSTD_buildFSETable_body(ZSTD_seqSymbol* dt, for (i=3D0; i highThreshold) position =3D (position + = step) & tableMask; /* lowprob area */ + while (UNLIKELY(position > highThreshold)) position =3D (p= osition + step) & tableMask; /* lowprob area */ } } assert(position =3D=3D 0); /* position must reach all cells once, = otherwise normalizedCounter is incorrect */ } @@ -551,7 +594,7 @@ void ZSTD_buildFSETable_body(ZSTD_seqSymbol* dt, for (u=3D0; u 0x7F) { if (nbSeq =3D=3D 0xFF) { RETURN_ERROR_IF(ip+2 > iend, srcSize_wrong, ""); @@ -681,11 +719,19 @@ size_t ZSTD_decodeSeqHeaders(ZSTD_DCtx* dctx, int* nb= SeqPtr, } *nbSeqPtr =3D nbSeq; =20 + if (nbSeq =3D=3D 0) { + /* No sequence : section ends immediately */ + RETURN_ERROR_IF(ip !=3D iend, corruption_detected, + "extraneous data present in the Sequences section"); + return (size_t)(ip - istart); + } + /* FSE table descriptors */ RETURN_ERROR_IF(ip+1 > iend, srcSize_wrong, ""); /* minimum possible s= ize: 1 byte for symbol encoding types */ - { symbolEncodingType_e const LLtype =3D (symbolEncodingType_e)(*ip >= > 6); - symbolEncodingType_e const OFtype =3D (symbolEncodingType_e)((*ip = >> 4) & 3); - symbolEncodingType_e const MLtype =3D (symbolEncodingType_e)((*ip = >> 2) & 3); + RETURN_ERROR_IF(*ip & 3, corruption_detected, ""); /* The last field, = Reserved, must be all-zeroes. */ + { SymbolEncodingType_e const LLtype =3D (SymbolEncodingType_e)(*ip >= > 6); + SymbolEncodingType_e const OFtype =3D (SymbolEncodingType_e)((*ip = >> 4) & 3); + SymbolEncodingType_e const MLtype =3D (SymbolEncodingType_e)((*ip = >> 2) & 3); ip++; =20 /* Build DTables */ @@ -829,7 +875,7 @@ static void ZSTD_safecopy(BYTE* op, const BYTE* const o= end_w, BYTE const* ip, pt /* ZSTD_safecopyDstBeforeSrc(): * This version allows overlap with dst before src, or handles the non-ove= rlap case with dst after src * Kept separate from more common ZSTD_safecopy case to avoid performance = impact to the safecopy common case */ -static void ZSTD_safecopyDstBeforeSrc(BYTE* op, BYTE const* ip, ptrdiff_t = length) { +static void ZSTD_safecopyDstBeforeSrc(BYTE* op, const BYTE* ip, ptrdiff_t = length) { ptrdiff_t const diff =3D op - ip; BYTE* const oend =3D op + length; =20 @@ -858,6 +904,7 @@ static void ZSTD_safecopyDstBeforeSrc(BYTE* op, BYTE co= nst* ip, ptrdiff_t length * to be optimized for many small sequences, since those fall into ZSTD_ex= ecSequence(). */ FORCE_NOINLINE +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR size_t ZSTD_execSequenceEnd(BYTE* op, BYTE* const oend, seq_t sequence, const BYTE** litPtr, const BYTE* const litLimit, @@ -905,6 +952,7 @@ size_t ZSTD_execSequenceEnd(BYTE* op, * This version is intended to be used during instances where the litBuffe= r is still split. It is kept separate to avoid performance impact for the = good case. */ FORCE_NOINLINE +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR size_t ZSTD_execSequenceEndSplitLitBuffer(BYTE* op, BYTE* const oend, const BYTE* const oend_w, seq_t sequence, const BYTE** litPtr, const BYTE* const litLimit, @@ -950,6 +998,7 @@ size_t ZSTD_execSequenceEndSplitLitBuffer(BYTE* op, } =20 HINT_INLINE +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR size_t ZSTD_execSequence(BYTE* op, BYTE* const oend, seq_t sequence, const BYTE** litPtr, const BYTE* const litLimit, @@ -964,6 +1013,11 @@ size_t ZSTD_execSequence(BYTE* op, =20 assert(op !=3D NULL /* Precondition */); assert(oend_w < oend /* No underflow */); + +#if defined(__aarch64__) + /* prefetch sequence starting from match that will be used for copy la= ter */ + PREFETCH_L1(match); +#endif /* Handle edge cases in a slow path: * - Read beyond end of literals * - Match end is within WILDCOPY_OVERLIMIT of oend @@ -1043,6 +1097,7 @@ size_t ZSTD_execSequence(BYTE* op, } =20 HINT_INLINE +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR size_t ZSTD_execSequenceSplitLitBuffer(BYTE* op, BYTE* const oend, const BYTE* const oend_w, seq_t sequence, const BYTE** litPtr, const BYTE* const litLimit, @@ -1154,7 +1209,7 @@ ZSTD_updateFseStateWithDInfo(ZSTD_fseState* DStatePtr= , BIT_DStream_t* bitD, U16 } =20 /* We need to add at most (ZSTD_WINDOWLOG_MAX_32 - 1) bits to read the max= imum - * offset bits. But we can only read at most (STREAM_ACCUMULATOR_MIN_32 - = 1) + * offset bits. But we can only read at most STREAM_ACCUMULATOR_MIN_32 * bits before reloading. This value is the maximum number of bytes we read * after reloading when we are decoding long offsets. */ @@ -1165,13 +1220,37 @@ ZSTD_updateFseStateWithDInfo(ZSTD_fseState* DStateP= tr, BIT_DStream_t* bitD, U16 =20 typedef enum { ZSTD_lo_isRegularOffset, ZSTD_lo_isLongOffset=3D1 } ZSTD_lo= ngOffset_e; =20 +/* + * ZSTD_decodeSequence(): + * @p longOffsets : tells the decoder to reload more bit while decoding la= rge offsets + * only used in 32-bit mode + * @return : Sequence (litL + matchL + offset) + */ FORCE_INLINE_TEMPLATE seq_t -ZSTD_decodeSequence(seqState_t* seqState, const ZSTD_longOffset_e longOffs= ets) +ZSTD_decodeSequence(seqState_t* seqState, const ZSTD_longOffset_e longOffs= ets, const int isLastSeq) { seq_t seq; + /* + * ZSTD_seqSymbol is a 64 bits wide structure. + * It can be loaded in one operation + * and its fields extracted by simply shifting or bit-extracting on aa= rch64. + * GCC doesn't recognize this and generates more unnecessary ldr/ldrb/= ldrh + * operations that cause performance drop. This can be avoided by usin= g this + * ZSTD_memcpy hack. + */ +#if defined(__aarch64__) && (defined(__GNUC__) && !defined(__clang__)) + ZSTD_seqSymbol llDInfoS, mlDInfoS, ofDInfoS; + ZSTD_seqSymbol* const llDInfo =3D &llDInfoS; + ZSTD_seqSymbol* const mlDInfo =3D &mlDInfoS; + ZSTD_seqSymbol* const ofDInfo =3D &ofDInfoS; + ZSTD_memcpy(llDInfo, seqState->stateLL.table + seqState->stateLL.state= , sizeof(ZSTD_seqSymbol)); + ZSTD_memcpy(mlDInfo, seqState->stateML.table + seqState->stateML.state= , sizeof(ZSTD_seqSymbol)); + ZSTD_memcpy(ofDInfo, seqState->stateOffb.table + seqState->stateOffb.s= tate, sizeof(ZSTD_seqSymbol)); +#else const ZSTD_seqSymbol* const llDInfo =3D seqState->stateLL.table + seqS= tate->stateLL.state; const ZSTD_seqSymbol* const mlDInfo =3D seqState->stateML.table + seqS= tate->stateML.state; const ZSTD_seqSymbol* const ofDInfo =3D seqState->stateOffb.table + se= qState->stateOffb.state; +#endif seq.matchLength =3D mlDInfo->baseValue; seq.litLength =3D llDInfo->baseValue; { U32 const ofBase =3D ofDInfo->baseValue; @@ -1186,28 +1265,31 @@ ZSTD_decodeSequence(seqState_t* seqState, const ZST= D_longOffset_e longOffsets) U32 const llnbBits =3D llDInfo->nbBits; U32 const mlnbBits =3D mlDInfo->nbBits; U32 const ofnbBits =3D ofDInfo->nbBits; + + assert(llBits <=3D MaxLLBits); + assert(mlBits <=3D MaxMLBits); + assert(ofBits <=3D MaxOff); /* * As gcc has better branch and block analyzers, sometimes it is o= nly - * valuable to mark likelyness for clang, it gives around 3-4% of + * valuable to mark likeliness for clang, it gives around 3-4% of * performance. */ =20 /* sequence */ { size_t offset; - #if defined(__clang__) - if (LIKELY(ofBits > 1)) { - #else if (ofBits > 1) { - #endif ZSTD_STATIC_ASSERT(ZSTD_lo_isLongOffset =3D=3D 1); ZSTD_STATIC_ASSERT(LONG_OFFSETS_MAX_EXTRA_BITS_32 =3D=3D 5= ); - assert(ofBits <=3D MaxOff); + ZSTD_STATIC_ASSERT(STREAM_ACCUMULATOR_MIN_32 > LONG_OFFSET= S_MAX_EXTRA_BITS_32); + ZSTD_STATIC_ASSERT(STREAM_ACCUMULATOR_MIN_32 - LONG_OFFSET= S_MAX_EXTRA_BITS_32 >=3D MaxMLBits); if (MEM_32bits() && longOffsets && (ofBits >=3D STREAM_ACC= UMULATOR_MIN_32)) { - U32 const extraBits =3D ofBits - MIN(ofBits, 32 - seqS= tate->DStream.bitsConsumed); + /* Always read extra bits, this keeps the logic simple, + * avoids branches, and avoids accidentally reading 0 = bits. + */ + U32 const extraBits =3D LONG_OFFSETS_MAX_EXTRA_BITS_32; offset =3D ofBase + (BIT_readBitsFast(&seqState->DStre= am, ofBits - extraBits) << extraBits); BIT_reloadDStream(&seqState->DStream); - if (extraBits) offset +=3D BIT_readBitsFast(&seqState-= >DStream, extraBits); - assert(extraBits <=3D LONG_OFFSETS_MAX_EXTRA_BITS_32);= /* to avoid another reload */ + offset +=3D BIT_readBitsFast(&seqState->DStream, extra= Bits); } else { offset =3D ofBase + BIT_readBitsFast(&seqState->DStrea= m, ofBits/*>0*/); /* <=3D (ZSTD_WINDOWLOG_MAX-1) bits */ if (MEM_32bits()) BIT_reloadDStream(&seqState->DStream= ); @@ -1224,7 +1306,7 @@ ZSTD_decodeSequence(seqState_t* seqState, const ZSTD_= longOffset_e longOffsets) } else { offset =3D ofBase + ll0 + BIT_readBitsFast(&seqState->= DStream, 1); { size_t temp =3D (offset=3D=3D3) ? seqState->prevOf= fset[0] - 1 : seqState->prevOffset[offset]; - temp +=3D !temp; /* 0 is not valid; input is cor= rupted; force offset to 1 */ + temp -=3D !temp; /* 0 is not valid: input corrupte= d =3D> force offset to -1 =3D> corruption detected at execSequence */ if (offset !=3D 1) seqState->prevOffset[2] =3D seq= State->prevOffset[1]; seqState->prevOffset[1] =3D seqState->prevOffset[0= ]; seqState->prevOffset[0] =3D offset =3D temp; @@ -1232,11 +1314,7 @@ ZSTD_decodeSequence(seqState_t* seqState, const ZSTD= _longOffset_e longOffsets) seq.offset =3D offset; } =20 - #if defined(__clang__) - if (UNLIKELY(mlBits > 0)) - #else if (mlBits > 0) - #endif seq.matchLength +=3D BIT_readBitsFast(&seqState->DStream, mlBi= ts/*>0*/); =20 if (MEM_32bits() && (mlBits+llBits >=3D STREAM_ACCUMULATOR_MIN_32-= LONG_OFFSETS_MAX_EXTRA_BITS_32)) @@ -1246,11 +1324,7 @@ ZSTD_decodeSequence(seqState_t* seqState, const ZSTD= _longOffset_e longOffsets) /* Ensure there are enough bits to read the rest of data in 64-bit= mode. */ ZSTD_STATIC_ASSERT(16+LLFSELog+MLFSELog+OffFSELog < STREAM_ACCUMUL= ATOR_MIN_64); =20 - #if defined(__clang__) - if (UNLIKELY(llBits > 0)) - #else if (llBits > 0) - #endif seq.litLength +=3D BIT_readBitsFast(&seqState->DStream, llBits= /*>0*/); =20 if (MEM_32bits()) @@ -1259,17 +1333,22 @@ ZSTD_decodeSequence(seqState_t* seqState, const ZST= D_longOffset_e longOffsets) DEBUGLOG(6, "seq: litL=3D%u, matchL=3D%u, offset=3D%u", (U32)seq.litLength, (U32)seq.matchLength, (U32)seq.off= set); =20 - ZSTD_updateFseStateWithDInfo(&seqState->stateLL, &seqState->DStrea= m, llNext, llnbBits); /* <=3D 9 bits */ - ZSTD_updateFseStateWithDInfo(&seqState->stateML, &seqState->DStrea= m, mlNext, mlnbBits); /* <=3D 9 bits */ - if (MEM_32bits()) BIT_reloadDStream(&seqState->DStream); /* <= =3D 18 bits */ - ZSTD_updateFseStateWithDInfo(&seqState->stateOffb, &seqState->DStr= eam, ofNext, ofnbBits); /* <=3D 8 bits */ + if (!isLastSeq) { + /* don't update FSE state for last Sequence */ + ZSTD_updateFseStateWithDInfo(&seqState->stateLL, &seqState->DS= tream, llNext, llnbBits); /* <=3D 9 bits */ + ZSTD_updateFseStateWithDInfo(&seqState->stateML, &seqState->DS= tream, mlNext, mlnbBits); /* <=3D 9 bits */ + if (MEM_32bits()) BIT_reloadDStream(&seqState->DStream); /*= <=3D 18 bits */ + ZSTD_updateFseStateWithDInfo(&seqState->stateOffb, &seqState->= DStream, ofNext, ofnbBits); /* <=3D 8 bits */ + BIT_reloadDStream(&seqState->DStream); + } } =20 return seq; } =20 -#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION -MEM_STATIC int ZSTD_dictionaryIsActive(ZSTD_DCtx const* dctx, BYTE const* = prefixStart, BYTE const* oLitEnd) +#if defined(FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION) && defined(FUZZING_A= SSERT_VALID_SEQUENCE) +#if DEBUGLEVEL >=3D 1 +static int ZSTD_dictionaryIsActive(ZSTD_DCtx const* dctx, BYTE const* pref= ixStart, BYTE const* oLitEnd) { size_t const windowSize =3D dctx->fParams.windowSize; /* No dictionary used. */ @@ -1283,30 +1362,33 @@ MEM_STATIC int ZSTD_dictionaryIsActive(ZSTD_DCtx co= nst* dctx, BYTE const* prefix /* Dictionary is active. */ return 1; } +#endif =20 -MEM_STATIC void ZSTD_assertValidSequence( +static void ZSTD_assertValidSequence( ZSTD_DCtx const* dctx, BYTE const* op, BYTE const* oend, seq_t const seq, BYTE const* prefixStart, BYTE const* virtualStart) { #if DEBUGLEVEL >=3D 1 - size_t const windowSize =3D dctx->fParams.windowSize; - size_t const sequenceSize =3D seq.litLength + seq.matchLength; - BYTE const* const oLitEnd =3D op + seq.litLength; - DEBUGLOG(6, "Checking sequence: litL=3D%u matchL=3D%u offset=3D%u", - (U32)seq.litLength, (U32)seq.matchLength, (U32)seq.offset); - assert(op <=3D oend); - assert((size_t)(oend - op) >=3D sequenceSize); - assert(sequenceSize <=3D ZSTD_BLOCKSIZE_MAX); - if (ZSTD_dictionaryIsActive(dctx, prefixStart, oLitEnd)) { - size_t const dictSize =3D (size_t)((char const*)dctx->dictContentE= ndForFuzzing - (char const*)dctx->dictContentBeginForFuzzing); - /* Offset must be within the dictionary. */ - assert(seq.offset <=3D (size_t)(oLitEnd - virtualStart)); - assert(seq.offset <=3D windowSize + dictSize); - } else { - /* Offset must be within our window. */ - assert(seq.offset <=3D windowSize); + if (dctx->isFrameDecompression) { + size_t const windowSize =3D dctx->fParams.windowSize; + size_t const sequenceSize =3D seq.litLength + seq.matchLength; + BYTE const* const oLitEnd =3D op + seq.litLength; + DEBUGLOG(6, "Checking sequence: litL=3D%u matchL=3D%u offset=3D%u", + (U32)seq.litLength, (U32)seq.matchLength, (U32)seq.offset); + assert(op <=3D oend); + assert((size_t)(oend - op) >=3D sequenceSize); + assert(sequenceSize <=3D ZSTD_blockSizeMax(dctx)); + if (ZSTD_dictionaryIsActive(dctx, prefixStart, oLitEnd)) { + size_t const dictSize =3D (size_t)((char const*)dctx->dictCont= entEndForFuzzing - (char const*)dctx->dictContentBeginForFuzzing); + /* Offset must be within the dictionary. */ + assert(seq.offset <=3D (size_t)(oLitEnd - virtualStart)); + assert(seq.offset <=3D windowSize + dictSize); + } else { + /* Offset must be within our window. */ + assert(seq.offset <=3D windowSize); + } } #else (void)dctx, (void)op, (void)oend, (void)seq, (void)prefixStart, (void)= virtualStart; @@ -1322,23 +1404,21 @@ DONT_VECTORIZE ZSTD_decompressSequences_bodySplitLitBuffer( ZSTD_DCtx* dctx, void* dst, size_t maxDstSize, const void* seqStart, size_t seqSize, int nbSeq, - const ZSTD_longOffset_e isLongOffset, - const int frame) + const ZSTD_longOffset_e isLongOffset) { const BYTE* ip =3D (const BYTE*)seqStart; const BYTE* const iend =3D ip + seqSize; BYTE* const ostart =3D (BYTE*)dst; - BYTE* const oend =3D ostart + maxDstSize; + BYTE* const oend =3D ZSTD_maybeNullPtrAdd(ostart, maxDstSize); BYTE* op =3D ostart; const BYTE* litPtr =3D dctx->litPtr; const BYTE* litBufferEnd =3D dctx->litBufferEnd; const BYTE* const prefixStart =3D (const BYTE*) (dctx->prefixStart); const BYTE* const vBase =3D (const BYTE*) (dctx->virtualStart); const BYTE* const dictEnd =3D (const BYTE*) (dctx->dictEnd); - DEBUGLOG(5, "ZSTD_decompressSequences_bodySplitLitBuffer"); - (void)frame; + DEBUGLOG(5, "ZSTD_decompressSequences_bodySplitLitBuffer (%i seqs)", n= bSeq); =20 - /* Regen sequences */ + /* Literals are split between internal buffer & output buffer */ if (nbSeq) { seqState_t seqState; dctx->fseEntropy =3D 1; @@ -1357,8 +1437,7 @@ ZSTD_decompressSequences_bodySplitLitBuffer( ZSTD_DCt= x* dctx, BIT_DStream_completed < BIT_DStream_overflow); =20 /* decompress without overrunning litPtr begins */ - { - seq_t sequence =3D ZSTD_decodeSequence(&seqState, isLongOffset= ); + { seq_t sequence =3D {0,0,0}; /* some static analyzer believe t= hat @sequence is not initialized (it necessarily is, since for(;;) loop as = at least one iteration) */ /* Align the decompression loop to 32 + 16 bytes. * * zstd compiled with gcc-9 on an Intel i9-9900k shows 10% = decompression @@ -1420,27 +1499,26 @@ ZSTD_decompressSequences_bodySplitLitBuffer( ZSTD_D= Ctx* dctx, #endif =20 /* Handle the initial state where litBuffer is currently split= between dst and litExtraBuffer */ - for (; litPtr + sequence.litLength <=3D dctx->litBufferEnd; ) { - size_t const oneSeqSize =3D ZSTD_execSequenceSplitLitBuffe= r(op, oend, litPtr + sequence.litLength - WILDCOPY_OVERLENGTH, sequence, &l= itPtr, litBufferEnd, prefixStart, vBase, dictEnd); + for ( ; nbSeq; nbSeq--) { + sequence =3D ZSTD_decodeSequence(&seqState, isLongOffset, = nbSeq=3D=3D1); + if (litPtr + sequence.litLength > dctx->litBufferEnd) brea= k; + { size_t const oneSeqSize =3D ZSTD_execSequenceSplitLitB= uffer(op, oend, litPtr + sequence.litLength - WILDCOPY_OVERLENGTH, sequence= , &litPtr, litBufferEnd, prefixStart, vBase, dictEnd); #if defined(FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION) && defined(FUZZING_A= SSERT_VALID_SEQUENCE) - assert(!ZSTD_isError(oneSeqSize)); - if (frame) ZSTD_assertValidSequence(dctx, op, oend, sequen= ce, prefixStart, vBase); + assert(!ZSTD_isError(oneSeqSize)); + ZSTD_assertValidSequence(dctx, op, oend, sequence, pre= fixStart, vBase); #endif - if (UNLIKELY(ZSTD_isError(oneSeqSize))) - return oneSeqSize; - DEBUGLOG(6, "regenerated sequence size : %u", (U32)oneSeqS= ize); - op +=3D oneSeqSize; - if (UNLIKELY(!--nbSeq)) - break; - BIT_reloadDStream(&(seqState.DStream)); - sequence =3D ZSTD_decodeSequence(&seqState, isLongOffset); - } + if (UNLIKELY(ZSTD_isError(oneSeqSize))) + return oneSeqSize; + DEBUGLOG(6, "regenerated sequence size : %u", (U32)one= SeqSize); + op +=3D oneSeqSize; + } } + DEBUGLOG(6, "reached: (litPtr + sequence.litLength > dctx->lit= BufferEnd)"); =20 /* If there are more sequences, they will need to read literal= s from litExtraBuffer; copy over the remainder from dst and update litPtr a= nd litEnd */ if (nbSeq > 0) { const size_t leftoverLit =3D dctx->litBufferEnd - litPtr; - if (leftoverLit) - { + DEBUGLOG(6, "There are %i sequences left, and %zu/%zu lite= rals left in buffer", nbSeq, leftoverLit, sequence.litLength); + if (leftoverLit) { RETURN_ERROR_IF(leftoverLit > (size_t)(oend - op), dst= Size_tooSmall, "remaining lit must fit within dstBuffer"); ZSTD_safecopyDstBeforeSrc(op, litPtr, leftoverLit); sequence.litLength -=3D leftoverLit; @@ -1449,24 +1527,22 @@ ZSTD_decompressSequences_bodySplitLitBuffer( ZSTD_D= Ctx* dctx, litPtr =3D dctx->litExtraBuffer; litBufferEnd =3D dctx->litExtraBuffer + ZSTD_LITBUFFEREXTR= ASIZE; dctx->litBufferLocation =3D ZSTD_not_in_dst; - { - size_t const oneSeqSize =3D ZSTD_execSequence(op, oend= , sequence, &litPtr, litBufferEnd, prefixStart, vBase, dictEnd); + { size_t const oneSeqSize =3D ZSTD_execSequence(op, oend= , sequence, &litPtr, litBufferEnd, prefixStart, vBase, dictEnd); #if defined(FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION) && defined(FUZZING_A= SSERT_VALID_SEQUENCE) assert(!ZSTD_isError(oneSeqSize)); - if (frame) ZSTD_assertValidSequence(dctx, op, oend, se= quence, prefixStart, vBase); + ZSTD_assertValidSequence(dctx, op, oend, sequence, pre= fixStart, vBase); #endif if (UNLIKELY(ZSTD_isError(oneSeqSize))) return oneSeqSize; DEBUGLOG(6, "regenerated sequence size : %u", (U32)one= SeqSize); op +=3D oneSeqSize; - if (--nbSeq) - BIT_reloadDStream(&(seqState.DStream)); } + nbSeq--; } } =20 - if (nbSeq > 0) /* there is remaining lit from extra buffer */ - { + if (nbSeq > 0) { + /* there is remaining lit from extra buffer */ =20 #if defined(__x86_64__) __asm__(".p2align 6"); @@ -1485,35 +1561,34 @@ ZSTD_decompressSequences_bodySplitLitBuffer( ZSTD_D= Ctx* dctx, # endif #endif =20 - for (; ; ) { - seq_t const sequence =3D ZSTD_decodeSequence(&seqState, is= LongOffset); + for ( ; nbSeq ; nbSeq--) { + seq_t const sequence =3D ZSTD_decodeSequence(&seqState, is= LongOffset, nbSeq=3D=3D1); size_t const oneSeqSize =3D ZSTD_execSequence(op, oend, se= quence, &litPtr, litBufferEnd, prefixStart, vBase, dictEnd); #if defined(FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION) && defined(FUZZING_A= SSERT_VALID_SEQUENCE) assert(!ZSTD_isError(oneSeqSize)); - if (frame) ZSTD_assertValidSequence(dctx, op, oend, sequen= ce, prefixStart, vBase); + ZSTD_assertValidSequence(dctx, op, oend, sequence, prefixS= tart, vBase); #endif if (UNLIKELY(ZSTD_isError(oneSeqSize))) return oneSeqSize; DEBUGLOG(6, "regenerated sequence size : %u", (U32)oneSeqS= ize); op +=3D oneSeqSize; - if (UNLIKELY(!--nbSeq)) - break; - BIT_reloadDStream(&(seqState.DStream)); } } =20 /* check if reached exact end */ DEBUGLOG(5, "ZSTD_decompressSequences_bodySplitLitBuffer: after de= code loop, remaining nbSeq : %i", nbSeq); RETURN_ERROR_IF(nbSeq, corruption_detected, ""); - RETURN_ERROR_IF(BIT_reloadDStream(&seqState.DStream) < BIT_DStream= _completed, corruption_detected, ""); + DEBUGLOG(5, "bitStream : start=3D%p, ptr=3D%p, bitsConsumed=3D%u",= seqState.DStream.start, seqState.DStream.ptr, seqState.DStream.bitsConsume= d); + RETURN_ERROR_IF(!BIT_endOfDStream(&seqState.DStream), corruption_d= etected, ""); /* save reps for next block */ { U32 i; for (i=3D0; ientropy.rep[i] =3D= (U32)(seqState.prevOffset[i]); } } =20 /* last literal segment */ - if (dctx->litBufferLocation =3D=3D ZSTD_split) /* split hasn't been r= eached yet, first get dst then copy litExtraBuffer */ - { - size_t const lastLLSize =3D litBufferEnd - litPtr; + if (dctx->litBufferLocation =3D=3D ZSTD_split) { + /* split hasn't been reached yet, first get dst then copy litExtra= Buffer */ + size_t const lastLLSize =3D (size_t)(litBufferEnd - litPtr); + DEBUGLOG(6, "copy last literals from segment : %u", (U32)lastLLSiz= e); RETURN_ERROR_IF(lastLLSize > (size_t)(oend - op), dstSize_tooSmall= , ""); if (op !=3D NULL) { ZSTD_memmove(op, litPtr, lastLLSize); @@ -1523,15 +1598,17 @@ ZSTD_decompressSequences_bodySplitLitBuffer( ZSTD_D= Ctx* dctx, litBufferEnd =3D dctx->litExtraBuffer + ZSTD_LITBUFFEREXTRASIZE; dctx->litBufferLocation =3D ZSTD_not_in_dst; } - { size_t const lastLLSize =3D litBufferEnd - litPtr; + /* copy last literals from internal buffer */ + { size_t const lastLLSize =3D (size_t)(litBufferEnd - litPtr); + DEBUGLOG(6, "copy last literals from internal buffer : %u", (U32)l= astLLSize); RETURN_ERROR_IF(lastLLSize > (size_t)(oend-op), dstSize_tooSmall, = ""); if (op !=3D NULL) { ZSTD_memcpy(op, litPtr, lastLLSize); op +=3D lastLLSize; - } - } + } } =20 - return op-ostart; + DEBUGLOG(6, "decoded block of size %u bytes", (U32)(op - ostart)); + return (size_t)(op - ostart); } =20 FORCE_INLINE_TEMPLATE size_t @@ -1539,21 +1616,19 @@ DONT_VECTORIZE ZSTD_decompressSequences_body(ZSTD_DCtx* dctx, void* dst, size_t maxDstSize, const void* seqStart, size_t seqSize, int nbSeq, - const ZSTD_longOffset_e isLongOffset, - const int frame) + const ZSTD_longOffset_e isLongOffset) { const BYTE* ip =3D (const BYTE*)seqStart; const BYTE* const iend =3D ip + seqSize; BYTE* const ostart =3D (BYTE*)dst; - BYTE* const oend =3D dctx->litBufferLocation =3D=3D ZSTD_not_in_dst ? = ostart + maxDstSize : dctx->litBuffer; + BYTE* const oend =3D dctx->litBufferLocation =3D=3D ZSTD_not_in_dst ? = ZSTD_maybeNullPtrAdd(ostart, maxDstSize) : dctx->litBuffer; BYTE* op =3D ostart; const BYTE* litPtr =3D dctx->litPtr; const BYTE* const litEnd =3D litPtr + dctx->litSize; const BYTE* const prefixStart =3D (const BYTE*)(dctx->prefixStart); const BYTE* const vBase =3D (const BYTE*)(dctx->virtualStart); const BYTE* const dictEnd =3D (const BYTE*)(dctx->dictEnd); - DEBUGLOG(5, "ZSTD_decompressSequences_body"); - (void)frame; + DEBUGLOG(5, "ZSTD_decompressSequences_body: nbSeq =3D %d", nbSeq); =20 /* Regen sequences */ if (nbSeq) { @@ -1568,11 +1643,6 @@ ZSTD_decompressSequences_body(ZSTD_DCtx* dctx, ZSTD_initFseState(&seqState.stateML, &seqState.DStream, dctx->MLTp= tr); assert(dst !=3D NULL); =20 - ZSTD_STATIC_ASSERT( - BIT_DStream_unfinished < BIT_DStream_completed && - BIT_DStream_endOfBuffer < BIT_DStream_completed && - BIT_DStream_completed < BIT_DStream_overflow); - #if defined(__x86_64__) __asm__(".p2align 6"); __asm__("nop"); @@ -1587,73 +1657,70 @@ ZSTD_decompressSequences_body(ZSTD_DCtx* dctx, # endif #endif =20 - for ( ; ; ) { - seq_t const sequence =3D ZSTD_decodeSequence(&seqState, isLong= Offset); + for ( ; nbSeq ; nbSeq--) { + seq_t const sequence =3D ZSTD_decodeSequence(&seqState, isLong= Offset, nbSeq=3D=3D1); size_t const oneSeqSize =3D ZSTD_execSequence(op, oend, sequen= ce, &litPtr, litEnd, prefixStart, vBase, dictEnd); #if defined(FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION) && defined(FUZZING_A= SSERT_VALID_SEQUENCE) assert(!ZSTD_isError(oneSeqSize)); - if (frame) ZSTD_assertValidSequence(dctx, op, oend, sequence, = prefixStart, vBase); + ZSTD_assertValidSequence(dctx, op, oend, sequence, prefixStart= , vBase); #endif if (UNLIKELY(ZSTD_isError(oneSeqSize))) return oneSeqSize; DEBUGLOG(6, "regenerated sequence size : %u", (U32)oneSeqSize); op +=3D oneSeqSize; - if (UNLIKELY(!--nbSeq)) - break; - BIT_reloadDStream(&(seqState.DStream)); } =20 /* check if reached exact end */ - DEBUGLOG(5, "ZSTD_decompressSequences_body: after decode loop, rem= aining nbSeq : %i", nbSeq); - RETURN_ERROR_IF(nbSeq, corruption_detected, ""); - RETURN_ERROR_IF(BIT_reloadDStream(&seqState.DStream) < BIT_DStream= _completed, corruption_detected, ""); + assert(nbSeq =3D=3D 0); + RETURN_ERROR_IF(!BIT_endOfDStream(&seqState.DStream), corruption_d= etected, ""); /* save reps for next block */ { U32 i; for (i=3D0; ientropy.rep[i] =3D= (U32)(seqState.prevOffset[i]); } } =20 /* last literal segment */ - { size_t const lastLLSize =3D litEnd - litPtr; + { size_t const lastLLSize =3D (size_t)(litEnd - litPtr); + DEBUGLOG(6, "copy last literals : %u", (U32)lastLLSize); RETURN_ERROR_IF(lastLLSize > (size_t)(oend-op), dstSize_tooSmall, = ""); if (op !=3D NULL) { ZSTD_memcpy(op, litPtr, lastLLSize); op +=3D lastLLSize; - } - } + } } =20 - return op-ostart; + DEBUGLOG(6, "decoded block of size %u bytes", (U32)(op - ostart)); + return (size_t)(op - ostart); } =20 static size_t ZSTD_decompressSequences_default(ZSTD_DCtx* dctx, void* dst, size_t maxDstSize, const void* seqStart, size_t seqSize, int nbSeq, - const ZSTD_longOffset_e isLongOffset, - const int frame) + const ZSTD_longOffset_e isLongOffset) { - return ZSTD_decompressSequences_body(dctx, dst, maxDstSize, seqStart, = seqSize, nbSeq, isLongOffset, frame); + return ZSTD_decompressSequences_body(dctx, dst, maxDstSize, seqStart, = seqSize, nbSeq, isLongOffset); } =20 static size_t ZSTD_decompressSequencesSplitLitBuffer_default(ZSTD_DCtx* dctx, void* dst, size_t maxDstSiz= e, const void* seqStart, size_t seqS= ize, int nbSeq, - const ZSTD_longOffset_e isLongOff= set, - const int frame) + const ZSTD_longOffset_e isLongOff= set) { - return ZSTD_decompressSequences_bodySplitLitBuffer(dctx, dst, maxDstSi= ze, seqStart, seqSize, nbSeq, isLongOffset, frame); + return ZSTD_decompressSequences_bodySplitLitBuffer(dctx, dst, maxDstSi= ze, seqStart, seqSize, nbSeq, isLongOffset); } #endif /* ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG */ =20 #ifndef ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT =20 -FORCE_INLINE_TEMPLATE size_t -ZSTD_prefetchMatch(size_t prefetchPos, seq_t const sequence, +FORCE_INLINE_TEMPLATE + +size_t ZSTD_prefetchMatch(size_t prefetchPos, seq_t const sequence, const BYTE* const prefixStart, const BYTE* const dictEn= d) { prefetchPos +=3D sequence.litLength; { const BYTE* const matchBase =3D (sequence.offset > prefetchPos) ? = dictEnd : prefixStart; - const BYTE* const match =3D matchBase + prefetchPos - sequence.off= set; /* note : this operation can overflow when seq.offset is really too la= rge, which can only happen when input is corrupted. - = * No consequence though : memory address is only used for prefetching, = not for dereferencing */ + /* note : this operation can overflow when seq.offset is really to= o large, which can only happen when input is corrupted. + * No consequence though : memory address is only used for prefetc= hing, not for dereferencing */ + const BYTE* const match =3D ZSTD_wrappedPtrSub(ZSTD_wrappedPtrAdd(= matchBase, prefetchPos), sequence.offset); PREFETCH_L1(match); PREFETCH_L1(match+CACHELINE_SIZE); /* note := it's safe to invoke PREFETCH() on any memory address, including invalid on= es */ } return prefetchPos + sequence.matchLength; @@ -1668,20 +1735,18 @@ ZSTD_decompressSequencesLong_body( ZSTD_DCtx* dctx, void* dst, size_t maxDstSize, const void* seqStart, size_t seqSize, int nbSeq, - const ZSTD_longOffset_e isLongOffset, - const int frame) + const ZSTD_longOffset_e isLongOffset) { const BYTE* ip =3D (const BYTE*)seqStart; const BYTE* const iend =3D ip + seqSize; BYTE* const ostart =3D (BYTE*)dst; - BYTE* const oend =3D dctx->litBufferLocation =3D=3D ZSTD_in_dst ? dctx= ->litBuffer : ostart + maxDstSize; + BYTE* const oend =3D dctx->litBufferLocation =3D=3D ZSTD_in_dst ? dctx= ->litBuffer : ZSTD_maybeNullPtrAdd(ostart, maxDstSize); BYTE* op =3D ostart; const BYTE* litPtr =3D dctx->litPtr; const BYTE* litBufferEnd =3D dctx->litBufferEnd; const BYTE* const prefixStart =3D (const BYTE*) (dctx->prefixStart); const BYTE* const dictStart =3D (const BYTE*) (dctx->virtualStart); const BYTE* const dictEnd =3D (const BYTE*) (dctx->dictEnd); - (void)frame; =20 /* Regen sequences */ if (nbSeq) { @@ -1706,20 +1771,17 @@ ZSTD_decompressSequencesLong_body( ZSTD_initFseState(&seqState.stateML, &seqState.DStream, dctx->MLTp= tr); =20 /* prepare in advance */ - for (seqNb=3D0; (BIT_reloadDStream(&seqState.DStream) <=3D BIT_DSt= ream_completed) && (seqNblitBufferLocation =3D=3D ZSTD_split && litPtr + sequ= ences[(seqNb - ADVANCED_SEQS) & STORED_SEQS_MASK].litLength > dctx->litBuff= erEnd) - { + if (dctx->litBufferLocation =3D=3D ZSTD_split && litPtr + sequ= ences[(seqNb - ADVANCED_SEQS) & STORED_SEQS_MASK].litLength > dctx->litBuff= erEnd) { /* lit buffer is reaching split point, empty out the first= buffer and transition to litExtraBuffer */ const size_t leftoverLit =3D dctx->litBufferEnd - litPtr; if (leftoverLit) @@ -1732,26 +1794,26 @@ ZSTD_decompressSequencesLong_body( litPtr =3D dctx->litExtraBuffer; litBufferEnd =3D dctx->litExtraBuffer + ZSTD_LITBUFFEREXTR= ASIZE; dctx->litBufferLocation =3D ZSTD_not_in_dst; - oneSeqSize =3D ZSTD_execSequence(op, oend, sequences[(seqN= b - ADVANCED_SEQS) & STORED_SEQS_MASK], &litPtr, litBufferEnd, prefixStart,= dictStart, dictEnd); + { size_t const oneSeqSize =3D ZSTD_execSequence(op, oend= , sequences[(seqNb - ADVANCED_SEQS) & STORED_SEQS_MASK], &litPtr, litBuffer= End, prefixStart, dictStart, dictEnd); #if defined(FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION) && defined(FUZZING_A= SSERT_VALID_SEQUENCE) - assert(!ZSTD_isError(oneSeqSize)); - if (frame) ZSTD_assertValidSequence(dctx, op, oend, sequen= ces[(seqNb - ADVANCED_SEQS) & STORED_SEQS_MASK], prefixStart, dictStart); + assert(!ZSTD_isError(oneSeqSize)); + ZSTD_assertValidSequence(dctx, op, oend, sequences[(se= qNb - ADVANCED_SEQS) & STORED_SEQS_MASK], prefixStart, dictStart); #endif - if (ZSTD_isError(oneSeqSize)) return oneSeqSize; + if (ZSTD_isError(oneSeqSize)) return oneSeqSize; =20 - prefetchPos =3D ZSTD_prefetchMatch(prefetchPos, sequence, = prefixStart, dictEnd); - sequences[seqNb & STORED_SEQS_MASK] =3D sequence; - op +=3D oneSeqSize; - } + prefetchPos =3D ZSTD_prefetchMatch(prefetchPos, sequen= ce, prefixStart, dictEnd); + sequences[seqNb & STORED_SEQS_MASK] =3D sequence; + op +=3D oneSeqSize; + } } else { /* lit buffer is either wholly contained in first or secon= d split, or not split at all*/ - oneSeqSize =3D dctx->litBufferLocation =3D=3D ZSTD_split ? + size_t const oneSeqSize =3D dctx->litBufferLocation =3D=3D= ZSTD_split ? ZSTD_execSequenceSplitLitBuffer(op, oend, litPtr + seq= uences[(seqNb - ADVANCED_SEQS) & STORED_SEQS_MASK].litLength - WILDCOPY_OVE= RLENGTH, sequences[(seqNb - ADVANCED_SEQS) & STORED_SEQS_MASK], &litPtr, li= tBufferEnd, prefixStart, dictStart, dictEnd) : ZSTD_execSequence(op, oend, sequences[(seqNb - ADVANCE= D_SEQS) & STORED_SEQS_MASK], &litPtr, litBufferEnd, prefixStart, dictStart,= dictEnd); #if defined(FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION) && defined(FUZZING_A= SSERT_VALID_SEQUENCE) assert(!ZSTD_isError(oneSeqSize)); - if (frame) ZSTD_assertValidSequence(dctx, op, oend, sequen= ces[(seqNb - ADVANCED_SEQS) & STORED_SEQS_MASK], prefixStart, dictStart); + ZSTD_assertValidSequence(dctx, op, oend, sequences[(seqNb = - ADVANCED_SEQS) & STORED_SEQS_MASK], prefixStart, dictStart); #endif if (ZSTD_isError(oneSeqSize)) return oneSeqSize; =20 @@ -1760,17 +1822,15 @@ ZSTD_decompressSequencesLong_body( op +=3D oneSeqSize; } } - RETURN_ERROR_IF(seqNblitBufferLocation =3D=3D ZSTD_split && litPtr + sequ= ence->litLength > dctx->litBufferEnd) - { + if (dctx->litBufferLocation =3D=3D ZSTD_split && litPtr + sequ= ence->litLength > dctx->litBufferEnd) { const size_t leftoverLit =3D dctx->litBufferEnd - litPtr; - if (leftoverLit) - { + if (leftoverLit) { RETURN_ERROR_IF(leftoverLit > (size_t)(oend - op), dst= Size_tooSmall, "remaining lit must fit within dstBuffer"); ZSTD_safecopyDstBeforeSrc(op, litPtr, leftoverLit); sequence->litLength -=3D leftoverLit; @@ -1779,11 +1839,10 @@ ZSTD_decompressSequencesLong_body( litPtr =3D dctx->litExtraBuffer; litBufferEnd =3D dctx->litExtraBuffer + ZSTD_LITBUFFEREXTR= ASIZE; dctx->litBufferLocation =3D ZSTD_not_in_dst; - { - size_t const oneSeqSize =3D ZSTD_execSequence(op, oend= , *sequence, &litPtr, litBufferEnd, prefixStart, dictStart, dictEnd); + { size_t const oneSeqSize =3D ZSTD_execSequence(op, oend= , *sequence, &litPtr, litBufferEnd, prefixStart, dictStart, dictEnd); #if defined(FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION) && defined(FUZZING_A= SSERT_VALID_SEQUENCE) assert(!ZSTD_isError(oneSeqSize)); - if (frame) ZSTD_assertValidSequence(dctx, op, oend, se= quences[seqNb&STORED_SEQS_MASK], prefixStart, dictStart); + ZSTD_assertValidSequence(dctx, op, oend, sequences[seq= Nb&STORED_SEQS_MASK], prefixStart, dictStart); #endif if (ZSTD_isError(oneSeqSize)) return oneSeqSize; op +=3D oneSeqSize; @@ -1796,7 +1855,7 @@ ZSTD_decompressSequencesLong_body( ZSTD_execSequence(op, oend, *sequence, &litPtr, litBuf= ferEnd, prefixStart, dictStart, dictEnd); #if defined(FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION) && defined(FUZZING_A= SSERT_VALID_SEQUENCE) assert(!ZSTD_isError(oneSeqSize)); - if (frame) ZSTD_assertValidSequence(dctx, op, oend, sequen= ces[seqNb&STORED_SEQS_MASK], prefixStart, dictStart); + ZSTD_assertValidSequence(dctx, op, oend, sequences[seqNb&S= TORED_SEQS_MASK], prefixStart, dictStart); #endif if (ZSTD_isError(oneSeqSize)) return oneSeqSize; op +=3D oneSeqSize; @@ -1808,8 +1867,7 @@ ZSTD_decompressSequencesLong_body( } =20 /* last literal segment */ - if (dctx->litBufferLocation =3D=3D ZSTD_split) /* first deplete liter= al buffer in dst, then copy litExtraBuffer */ - { + if (dctx->litBufferLocation =3D=3D ZSTD_split) { /* first deplete lite= ral buffer in dst, then copy litExtraBuffer */ size_t const lastLLSize =3D litBufferEnd - litPtr; RETURN_ERROR_IF(lastLLSize > (size_t)(oend - op), dstSize_tooSmall= , ""); if (op !=3D NULL) { @@ -1827,17 +1885,16 @@ ZSTD_decompressSequencesLong_body( } } =20 - return op-ostart; + return (size_t)(op - ostart); } =20 static size_t ZSTD_decompressSequencesLong_default(ZSTD_DCtx* dctx, void* dst, size_t maxDstSize, const void* seqStart, size_t seqSize, int nbSeq, - const ZSTD_longOffset_e isLongOffset, - const int frame) + const ZSTD_longOffset_e isLongOffset) { - return ZSTD_decompressSequencesLong_body(dctx, dst, maxDstSize, seqSta= rt, seqSize, nbSeq, isLongOffset, frame); + return ZSTD_decompressSequencesLong_body(dctx, dst, maxDstSize, seqSta= rt, seqSize, nbSeq, isLongOffset); } #endif /* ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT */ =20 @@ -1851,20 +1908,18 @@ DONT_VECTORIZE ZSTD_decompressSequences_bmi2(ZSTD_DCtx* dctx, void* dst, size_t maxDstSize, const void* seqStart, size_t seqSize, int nbSeq, - const ZSTD_longOffset_e isLongOffset, - const int frame) + const ZSTD_longOffset_e isLongOffset) { - return ZSTD_decompressSequences_body(dctx, dst, maxDstSize, seqStart, = seqSize, nbSeq, isLongOffset, frame); + return ZSTD_decompressSequences_body(dctx, dst, maxDstSize, seqStart, = seqSize, nbSeq, isLongOffset); } static BMI2_TARGET_ATTRIBUTE size_t DONT_VECTORIZE ZSTD_decompressSequencesSplitLitBuffer_bmi2(ZSTD_DCtx* dctx, void* dst, size_t maxDstSize, const void* seqStart, size_t seqSize, int nbSeq, - const ZSTD_longOffset_e isLongOffset, - const int frame) + const ZSTD_longOffset_e isLongOffset) { - return ZSTD_decompressSequences_bodySplitLitBuffer(dctx, dst, maxDstSi= ze, seqStart, seqSize, nbSeq, isLongOffset, frame); + return ZSTD_decompressSequences_bodySplitLitBuffer(dctx, dst, maxDstSi= ze, seqStart, seqSize, nbSeq, isLongOffset); } #endif /* ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG */ =20 @@ -1873,50 +1928,40 @@ static BMI2_TARGET_ATTRIBUTE size_t ZSTD_decompressSequencesLong_bmi2(ZSTD_DCtx* dctx, void* dst, size_t maxDstSize, const void* seqStart, size_t seqSize, int nbSeq, - const ZSTD_longOffset_e isLongOffset, - const int frame) + const ZSTD_longOffset_e isLongOffset) { - return ZSTD_decompressSequencesLong_body(dctx, dst, maxDstSize, seqSta= rt, seqSize, nbSeq, isLongOffset, frame); + return ZSTD_decompressSequencesLong_body(dctx, dst, maxDstSize, seqSta= rt, seqSize, nbSeq, isLongOffset); } #endif /* ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT */ =20 #endif /* DYNAMIC_BMI2 */ =20 -typedef size_t (*ZSTD_decompressSequences_t)( - ZSTD_DCtx* dctx, - void* dst, size_t maxDstSize, - const void* seqStart, size_t seqSize, int nbSe= q, - const ZSTD_longOffset_e isLongOffset, - const int frame); - #ifndef ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG static size_t ZSTD_decompressSequences(ZSTD_DCtx* dctx, void* dst, size_t maxDstSize, const void* seqStart, size_t seqSize, int nbSeq, - const ZSTD_longOffset_e isLongOffset, - const int frame) + const ZSTD_longOffset_e isLongOffset) { DEBUGLOG(5, "ZSTD_decompressSequences"); #if DYNAMIC_BMI2 if (ZSTD_DCtx_get_bmi2(dctx)) { - return ZSTD_decompressSequences_bmi2(dctx, dst, maxDstSize, seqSta= rt, seqSize, nbSeq, isLongOffset, frame); + return ZSTD_decompressSequences_bmi2(dctx, dst, maxDstSize, seqSta= rt, seqSize, nbSeq, isLongOffset); } #endif - return ZSTD_decompressSequences_default(dctx, dst, maxDstSize, seqStar= t, seqSize, nbSeq, isLongOffset, frame); + return ZSTD_decompressSequences_default(dctx, dst, maxDstSize, seqStar= t, seqSize, nbSeq, isLongOffset); } static size_t ZSTD_decompressSequencesSplitLitBuffer(ZSTD_DCtx* dctx, void* dst, size_t = maxDstSize, const void* seqStart, size_t seqSize, int= nbSeq, - const ZSTD_longOffset_e isLongOffset, - const int frame) + const ZSTD_longOffset_e isLongOffset) { DEBUGLOG(5, "ZSTD_decompressSequencesSplitLitBuffer"); #if DYNAMIC_BMI2 if (ZSTD_DCtx_get_bmi2(dctx)) { - return ZSTD_decompressSequencesSplitLitBuffer_bmi2(dctx, dst, maxD= stSize, seqStart, seqSize, nbSeq, isLongOffset, frame); + return ZSTD_decompressSequencesSplitLitBuffer_bmi2(dctx, dst, maxD= stSize, seqStart, seqSize, nbSeq, isLongOffset); } #endif - return ZSTD_decompressSequencesSplitLitBuffer_default(dctx, dst, maxDs= tSize, seqStart, seqSize, nbSeq, isLongOffset, frame); + return ZSTD_decompressSequencesSplitLitBuffer_default(dctx, dst, maxDs= tSize, seqStart, seqSize, nbSeq, isLongOffset); } #endif /* ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG */ =20 @@ -1931,69 +1976,114 @@ static size_t ZSTD_decompressSequencesLong(ZSTD_DCtx* dctx, void* dst, size_t maxDstSize, const void* seqStart, size_t seqSize, int nbS= eq, - const ZSTD_longOffset_e isLongOffset, - const int frame) + const ZSTD_longOffset_e isLongOffset) { DEBUGLOG(5, "ZSTD_decompressSequencesLong"); #if DYNAMIC_BMI2 if (ZSTD_DCtx_get_bmi2(dctx)) { - return ZSTD_decompressSequencesLong_bmi2(dctx, dst, maxDstSize, se= qStart, seqSize, nbSeq, isLongOffset, frame); + return ZSTD_decompressSequencesLong_bmi2(dctx, dst, maxDstSize, se= qStart, seqSize, nbSeq, isLongOffset); } #endif - return ZSTD_decompressSequencesLong_default(dctx, dst, maxDstSize, seqSt= art, seqSize, nbSeq, isLongOffset, frame); + return ZSTD_decompressSequencesLong_default(dctx, dst, maxDstSize, seqSt= art, seqSize, nbSeq, isLongOffset); } #endif /* ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT */ =20 =20 +/* + * @returns The total size of the history referenceable by zstd, including + * both the prefix and the extDict. At @p op any offset larger than this + * is invalid. + */ +static size_t ZSTD_totalHistorySize(BYTE* op, BYTE const* virtualStart) +{ + return (size_t)(op - virtualStart); +} + +typedef struct { + unsigned longOffsetShare; + unsigned maxNbAdditionalBits; +} ZSTD_OffsetInfo; =20 -#if !defined(ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT) && \ - !defined(ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG) -/* ZSTD_getLongOffsetsShare() : +/* ZSTD_getOffsetInfo() : * condition : offTable must be valid * @return : "share" of long offsets (arbitrarily defined as > (1<<23)) - * compared to maximum possible of (1< 22) total +=3D 1; + ZSTD_OffsetInfo info =3D {0, 0}; + /* If nbSeq =3D=3D 0, then the offTable is uninitialized, but we have + * no sequences, so both values should be 0. + */ + if (nbSeq !=3D 0) { + const void* ptr =3D offTable; + U32 const tableLog =3D ((const ZSTD_seqSymbol_header*)ptr)[0].tabl= eLog; + const ZSTD_seqSymbol* table =3D offTable + 1; + U32 const max =3D 1 << tableLog; + U32 u; + DEBUGLOG(5, "ZSTD_getLongOffsetsShare: (tableLog=3D%u)", tableLog); + + assert(max <=3D (1 << OffFSELog)); /* max not too large */ + for (u=3D0; u 22) info.longOffsetShare +=3D = 1; + } + + assert(tableLog <=3D OffFSELog); + info.longOffsetShare <<=3D (OffFSELog - tableLog); /* scale to Of= fFSELog */ } =20 - assert(tableLog <=3D OffFSELog); - total <<=3D (OffFSELog - tableLog); /* scale to OffFSELog */ + return info; +} =20 - return total; +/* + * @returns The maximum offset we can decode in one read of our bitstream,= without + * reloading more bits in the middle of the offset bits read. Any offsets = larger + * than this must use the long offset decoder. + */ +static size_t ZSTD_maxShortOffset(void) +{ + if (MEM_64bits()) { + /* We can decode any offset without reloading bits. + * This might change if the max window size grows. + */ + ZSTD_STATIC_ASSERT(ZSTD_WINDOWLOG_MAX <=3D 31); + return (size_t)-1; + } else { + /* The maximum offBase is (1 << (STREAM_ACCUMULATOR_MIN + 1)) - 1. + * This offBase would require STREAM_ACCUMULATOR_MIN extra bits. + * Then we have to subtract ZSTD_REP_NUM to get the maximum possib= le offset. + */ + size_t const maxOffbase =3D ((size_t)1 << (STREAM_ACCUMULATOR_MIN = + 1)) - 1; + size_t const maxOffset =3D maxOffbase - ZSTD_REP_NUM; + assert(ZSTD_highbit32((U32)maxOffbase) =3D=3D STREAM_ACCUMULATOR_M= IN); + return maxOffset; + } } -#endif =20 size_t ZSTD_decompressBlock_internal(ZSTD_DCtx* dctx, void* dst, size_t dstCapacity, - const void* src, size_t srcSize, const int frame, = const streaming_operation streaming) + const void* src, size_t srcSize, const streaming_o= peration streaming) { /* blockType =3D=3D blockCompressed */ const BYTE* ip =3D (const BYTE*)src; - /* isLongOffset must be true if there are long offsets. - * Offsets are long if they are larger than 2^STREAM_ACCUMULATOR_MIN. - * We don't expect that to be the case in 64-bit mode. - * In block mode, window size is not known, so we have to be conservat= ive. - * (note: but it could be evaluated from current-lowLimit) - */ - ZSTD_longOffset_e const isLongOffset =3D (ZSTD_longOffset_e)(MEM_32bit= s() && (!frame || (dctx->fParams.windowSize > (1ULL << STREAM_ACCUMULATOR_M= IN)))); - DEBUGLOG(5, "ZSTD_decompressBlock_internal (size : %u)", (U32)srcSize); - - RETURN_ERROR_IF(srcSize >=3D ZSTD_BLOCKSIZE_MAX, srcSize_wrong, ""); + DEBUGLOG(5, "ZSTD_decompressBlock_internal (cSize : %u)", (unsigned)sr= cSize); + + /* Note : the wording of the specification + * allows compressed block to be sized exactly ZSTD_blockSizeMax(dctx). + * This generally does not happen, as it makes little sense, + * since an uncompressed block would feature same size and have no dec= ompression cost. + * Also, note that decoder from reference libzstd before < v1.5.4 + * would consider this edge case as an error. + * As a consequence, avoid generating compressed blocks of size ZSTD_b= lockSizeMax(dctx) + * for broader compatibility with the deployed ecosystem of zstd decod= ers */ + RETURN_ERROR_IF(srcSize > ZSTD_blockSizeMax(dctx), srcSize_wrong, ""); =20 /* Decode literals section */ { size_t const litCSize =3D ZSTD_decodeLiteralsBlock(dctx, src, srcS= ize, dst, dstCapacity, streaming); - DEBUGLOG(5, "ZSTD_decodeLiteralsBlock : %u", (U32)litCSize); + DEBUGLOG(5, "ZSTD_decodeLiteralsBlock : cSize=3D%u, nbLiterals=3D%= zu", (U32)litCSize, dctx->litSize); if (ZSTD_isError(litCSize)) return litCSize; ip +=3D litCSize; srcSize -=3D litCSize; @@ -2001,6 +2091,23 @@ ZSTD_decompressBlock_internal(ZSTD_DCtx* dctx, =20 /* Build Decoding Tables */ { + /* Compute the maximum block size, which must also work when !fram= e and fParams are unset. + * Additionally, take the min with dstCapacity to ensure that the = totalHistorySize fits in a size_t. + */ + size_t const blockSizeMax =3D MIN(dstCapacity, ZSTD_blockSizeMax(d= ctx)); + size_t const totalHistorySize =3D ZSTD_totalHistorySize(ZSTD_maybe= NullPtrAdd((BYTE*)dst, blockSizeMax), (BYTE const*)dctx->virtualStart); + /* isLongOffset must be true if there are long offsets. + * Offsets are long if they are larger than ZSTD_maxShortOffset(). + * We don't expect that to be the case in 64-bit mode. + * + * We check here to see if our history is large enough to allow lo= ng offsets. + * If it isn't, then we can't possible have (valid) long offsets. = If the offset + * is invalid, then it is okay to read it incorrectly. + * + * If isLongOffsets is true, then we will later check our decoding= table to see + * if it is even possible to generate long offsets. + */ + ZSTD_longOffset_e isLongOffset =3D (ZSTD_longOffset_e)(MEM_32bits(= ) && (totalHistorySize > ZSTD_maxShortOffset())); /* These macros control at build-time which decompressor implement= ation * we use. If neither is defined, we do some inspection and dispat= ch at * runtime. @@ -2008,6 +2115,11 @@ ZSTD_decompressBlock_internal(ZSTD_DCtx* dctx, #if !defined(ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT) && \ !defined(ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG) int usePrefetchDecoder =3D dctx->ddictIsCold; +#else + /* Set to 1 to avoid computing offset info if we don't need to. + * Otherwise this value is ignored. + */ + int usePrefetchDecoder =3D 1; #endif int nbSeq; size_t const seqHSize =3D ZSTD_decodeSeqHeaders(dctx, &nbSeq, ip, = srcSize); @@ -2015,40 +2127,55 @@ ZSTD_decompressBlock_internal(ZSTD_DCtx* dctx, ip +=3D seqHSize; srcSize -=3D seqHSize; =20 - RETURN_ERROR_IF(dst =3D=3D NULL && nbSeq > 0, dstSize_tooSmall, "N= ULL not handled"); + RETURN_ERROR_IF((dst =3D=3D NULL || dstCapacity =3D=3D 0) && nbSeq= > 0, dstSize_tooSmall, "NULL not handled"); + RETURN_ERROR_IF(MEM_64bits() && sizeof(size_t) =3D=3D sizeof(void*= ) && (size_t)(-1) - (size_t)dst < (size_t)(1 << 20), dstSize_tooSmall, + "invalid dst"); =20 -#if !defined(ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT) && \ - !defined(ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG) - if ( !usePrefetchDecoder - && (!frame || (dctx->fParams.windowSize > (1<<24))) - && (nbSeq>ADVANCED_SEQS) ) { /* could probably use a larger nbS= eq limit */ - U32 const shareLongOffsets =3D ZSTD_getLongOffsetsShare(dctx->= OFTptr); - U32 const minShare =3D MEM_64bits() ? 7 : 20; /* heuristic val= ues, correspond to 2.73% and 7.81% */ - usePrefetchDecoder =3D (shareLongOffsets >=3D minShare); + /* If we could potentially have long offsets, or we might want to = use the prefetch decoder, + * compute information about the share of long offsets, and the ma= ximum nbAdditionalBits. + * NOTE: could probably use a larger nbSeq limit + */ + if (isLongOffset || (!usePrefetchDecoder && (totalHistorySize > (1= u << 24)) && (nbSeq > 8))) { + ZSTD_OffsetInfo const info =3D ZSTD_getOffsetInfo(dctx->OFTptr= , nbSeq); + if (isLongOffset && info.maxNbAdditionalBits <=3D STREAM_ACCUM= ULATOR_MIN) { + /* If isLongOffset, but the maximum number of additional b= its that we see in our table is small + * enough, then we know it is impossible to have too long = an offset in this block, so we can + * use the regular offset decoder. + */ + isLongOffset =3D ZSTD_lo_isRegularOffset; + } + if (!usePrefetchDecoder) { + U32 const minShare =3D MEM_64bits() ? 7 : 20; /* heuristic= values, correspond to 2.73% and 7.81% */ + usePrefetchDecoder =3D (info.longOffsetShare >=3D minShare= ); + } } -#endif =20 dctx->ddictIsCold =3D 0; =20 #if !defined(ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT) && \ !defined(ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG) - if (usePrefetchDecoder) + if (usePrefetchDecoder) { +#else + (void)usePrefetchDecoder; + { #endif #ifndef ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT - return ZSTD_decompressSequencesLong(dctx, dst, dstCapacity, ip= , srcSize, nbSeq, isLongOffset, frame); + return ZSTD_decompressSequencesLong(dctx, dst, dstCapacity, ip= , srcSize, nbSeq, isLongOffset); #endif + } =20 #ifndef ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG /* else */ if (dctx->litBufferLocation =3D=3D ZSTD_split) - return ZSTD_decompressSequencesSplitLitBuffer(dctx, dst, dstCa= pacity, ip, srcSize, nbSeq, isLongOffset, frame); + return ZSTD_decompressSequencesSplitLitBuffer(dctx, dst, dstCa= pacity, ip, srcSize, nbSeq, isLongOffset); else - return ZSTD_decompressSequences(dctx, dst, dstCapacity, ip, sr= cSize, nbSeq, isLongOffset, frame); + return ZSTD_decompressSequences(dctx, dst, dstCapacity, ip, sr= cSize, nbSeq, isLongOffset); #endif } } =20 =20 +ZSTD_ALLOW_POINTER_OVERFLOW_ATTR void ZSTD_checkContinuity(ZSTD_DCtx* dctx, const void* dst, size_t dstSize) { if (dst !=3D dctx->previousDstEnd && dstSize > 0) { /* not contiguou= s */ @@ -2060,13 +2187,24 @@ void ZSTD_checkContinuity(ZSTD_DCtx* dctx, const vo= id* dst, size_t dstSize) } =20 =20 -size_t ZSTD_decompressBlock(ZSTD_DCtx* dctx, - void* dst, size_t dstCapacity, - const void* src, size_t srcSize) +size_t ZSTD_decompressBlock_deprecated(ZSTD_DCtx* dctx, + void* dst, size_t dstCapacity, + const void* src, size_t srcSize) { size_t dSize; + dctx->isFrameDecompression =3D 0; ZSTD_checkContinuity(dctx, dst, dstCapacity); - dSize =3D ZSTD_decompressBlock_internal(dctx, dst, dstCapacity, src, s= rcSize, /* frame */ 0, not_streaming); + dSize =3D ZSTD_decompressBlock_internal(dctx, dst, dstCapacity, src, s= rcSize, not_streaming); + FORWARD_IF_ERROR(dSize, ""); dctx->previousDstEnd =3D (char*)dst + dSize; return dSize; } + + +/* NOTE: Must just wrap ZSTD_decompressBlock_deprecated() */ +size_t ZSTD_decompressBlock(ZSTD_DCtx* dctx, + void* dst, size_t dstCapacity, + const void* src, size_t srcSize) +{ + return ZSTD_decompressBlock_deprecated(dctx, dst, dstCapacity, src, sr= cSize); +} diff --git a/lib/zstd/decompress/zstd_decompress_block.h b/lib/zstd/decompr= ess/zstd_decompress_block.h index 3d2d57a5d25a..becffbd89364 100644 --- a/lib/zstd/decompress/zstd_decompress_block.h +++ b/lib/zstd/decompress/zstd_decompress_block.h @@ -1,5 +1,6 @@ +/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */ /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -47,7 +48,7 @@ typedef enum { */ size_t ZSTD_decompressBlock_internal(ZSTD_DCtx* dctx, void* dst, size_t dstCapacity, - const void* src, size_t srcSize, const int frame,= const streaming_operation streaming); + const void* src, size_t srcSize, const streaming_= operation streaming); =20 /* ZSTD_buildFSETable() : * generate FSE decoding table for one symbol (ll, ml or off) @@ -64,5 +65,10 @@ void ZSTD_buildFSETable(ZSTD_seqSymbol* dt, unsigned tableLog, void* wksp, size_t wkspSize, int bmi2); =20 +/* Internal definition of ZSTD_decompressBlock() to avoid deprecation warn= ings. */ +size_t ZSTD_decompressBlock_deprecated(ZSTD_DCtx* dctx, + void* dst, size_t dstCapacity, + const void* src, size_t srcSize); + =20 #endif /* ZSTD_DEC_BLOCK_H */ diff --git a/lib/zstd/decompress/zstd_decompress_internal.h b/lib/zstd/deco= mpress/zstd_decompress_internal.h index 98102edb6a83..2a225d1811c4 100644 --- a/lib/zstd/decompress/zstd_decompress_internal.h +++ b/lib/zstd/decompress/zstd_decompress_internal.h @@ -1,5 +1,6 @@ +/* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */ /* - * Copyright (c) Yann Collet, Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -75,12 +76,13 @@ static UNUSED_ATTR const U32 ML_base[MaxML+1] =3D { =20 #define ZSTD_BUILD_FSE_TABLE_WKSP_SIZE (sizeof(S16) * (MaxSeq + 1) + (1u <= < MaxFSELog) + sizeof(U64)) #define ZSTD_BUILD_FSE_TABLE_WKSP_SIZE_U32 ((ZSTD_BUILD_FSE_TABLE_WKSP_SIZ= E + sizeof(U32) - 1) / sizeof(U32)) +#define ZSTD_HUFFDTABLE_CAPACITY_LOG 12 =20 typedef struct { ZSTD_seqSymbol LLTable[SEQSYMBOL_TABLE_SIZE(LLFSELog)]; /* Note : S= pace reserved for FSE Tables */ ZSTD_seqSymbol OFTable[SEQSYMBOL_TABLE_SIZE(OffFSELog)]; /* is also = used as temporary workspace while building hufTable during DDict creation */ ZSTD_seqSymbol MLTable[SEQSYMBOL_TABLE_SIZE(MLFSELog)]; /* and ther= efore must be at least HUF_DECOMPRESS_WORKSPACE_SIZE large */ - HUF_DTable hufTable[HUF_DTABLE_SIZE(HufLog)]; /* can accommodate HUF_= decompress4X */ + HUF_DTable hufTable[HUF_DTABLE_SIZE(ZSTD_HUFFDTABLE_CAPACITY_LOG)]; /= * can accommodate HUF_decompress4X */ U32 rep[ZSTD_REP_NUM]; U32 workspace[ZSTD_BUILD_FSE_TABLE_WKSP_SIZE_U32]; } ZSTD_entropyDTables_t; @@ -135,7 +137,7 @@ struct ZSTD_DCtx_s const void* virtualStart; /* virtual start of previous segment if = it was just before current one */ const void* dictEnd; /* end of previous segment */ size_t expected; - ZSTD_frameHeader fParams; + ZSTD_FrameHeader fParams; U64 processedCSize; U64 decodedSize; blockType_e bType; /* used in ZSTD_decompressContinue(), st= ore blockType between block header decoding and block decompression stages = */ @@ -152,7 +154,8 @@ struct ZSTD_DCtx_s size_t litSize; size_t rleSize; size_t staticSize; -#if DYNAMIC_BMI2 !=3D 0 + int isFrameDecompression; +#if DYNAMIC_BMI2 int bmi2; /* =3D=3D 1 if the CPU supports BMI2 and= 0 otherwise. CPU support is determined dynamically once per context lifeti= me. */ #endif =20 @@ -164,6 +167,8 @@ struct ZSTD_DCtx_s ZSTD_dictUses_e dictUses; ZSTD_DDictHashSet* ddictSet; /* Hash set for multip= le ddicts */ ZSTD_refMultipleDDicts_e refMultipleDDicts; /* User specified: if = =3D=3D 1, will allow references to multiple DDicts. Default =3D=3D 0 (disab= led) */ + int disableHufAsm; + int maxBlockSizeParam; =20 /* streaming */ ZSTD_dStreamStage streamStage; @@ -199,11 +204,11 @@ struct ZSTD_DCtx_s }; /* typedef'd to ZSTD_DCtx within "zstd.h" */ =20 MEM_STATIC int ZSTD_DCtx_get_bmi2(const struct ZSTD_DCtx_s *dctx) { -#if DYNAMIC_BMI2 !=3D 0 - return dctx->bmi2; +#if DYNAMIC_BMI2 + return dctx->bmi2; #else (void)dctx; - return 0; + return 0; #endif } =20 diff --git a/lib/zstd/decompress_sources.h b/lib/zstd/decompress_sources.h index a06ca187aab5..8a47eb2a4514 100644 --- a/lib/zstd/decompress_sources.h +++ b/lib/zstd/decompress_sources.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause */ /* - * Copyright (c) Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the diff --git a/lib/zstd/zstd_common_module.c b/lib/zstd/zstd_common_module.c index 22686e367e6f..466828e35752 100644 --- a/lib/zstd/zstd_common_module.c +++ b/lib/zstd/zstd_common_module.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause /* - * Copyright (c) Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -24,9 +24,6 @@ EXPORT_SYMBOL_GPL(HUF_readStats_wksp); EXPORT_SYMBOL_GPL(ZSTD_isError); EXPORT_SYMBOL_GPL(ZSTD_getErrorName); EXPORT_SYMBOL_GPL(ZSTD_getErrorCode); -EXPORT_SYMBOL_GPL(ZSTD_customMalloc); -EXPORT_SYMBOL_GPL(ZSTD_customCalloc); -EXPORT_SYMBOL_GPL(ZSTD_customFree); =20 MODULE_LICENSE("Dual BSD/GPL"); MODULE_DESCRIPTION("Zstd Common"); diff --git a/lib/zstd/zstd_compress_module.c b/lib/zstd/zstd_compress_modul= e.c index bd8784449b31..7651b53551c8 100644 --- a/lib/zstd/zstd_compress_module.c +++ b/lib/zstd/zstd_compress_module.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause /* - * Copyright (c) Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -16,6 +16,7 @@ =20 #include "common/zstd_deps.h" #include "common/zstd_internal.h" +#include "compress/zstd_compress_internal.h" =20 #define ZSTD_FORWARD_IF_ERR(ret) \ do { \ @@ -92,12 +93,64 @@ zstd_compression_parameters zstd_get_cparams(int level, } EXPORT_SYMBOL(zstd_get_cparams); =20 +size_t zstd_cctx_set_param(zstd_cctx *cctx, ZSTD_cParameter param, int val= ue) +{ + return ZSTD_CCtx_setParameter(cctx, param, value); +} +EXPORT_SYMBOL(zstd_cctx_set_param); + size_t zstd_cctx_workspace_bound(const zstd_compression_parameters *cparam= s) { return ZSTD_estimateCCtxSize_usingCParams(*cparams); } EXPORT_SYMBOL(zstd_cctx_workspace_bound); =20 +// Used by zstd_cctx_workspace_bound_with_ext_seq_prod() +static size_t dummy_external_sequence_producer( + void *sequenceProducerState, + ZSTD_Sequence *outSeqs, size_t outSeqsCapacity, + const void *src, size_t srcSize, + const void *dict, size_t dictSize, + int compressionLevel, + size_t windowSize) +{ + (void)sequenceProducerState; + (void)outSeqs; (void)outSeqsCapacity; + (void)src; (void)srcSize; + (void)dict; (void)dictSize; + (void)compressionLevel; + (void)windowSize; + return ZSTD_SEQUENCE_PRODUCER_ERROR; +} + +static void init_cctx_params_from_compress_params( + ZSTD_CCtx_params *cctx_params, + const zstd_compression_parameters *compress_params) +{ + ZSTD_parameters zstd_params; + memset(&zstd_params, 0, sizeof(zstd_params)); + zstd_params.cParams =3D *compress_params; + ZSTD_CCtxParams_init_advanced(cctx_params, zstd_params); +} + +size_t zstd_cctx_workspace_bound_with_ext_seq_prod(const zstd_compression_= parameters *compress_params) +{ + ZSTD_CCtx_params cctx_params; + init_cctx_params_from_compress_params(&cctx_params, compress_params); + ZSTD_CCtxParams_registerSequenceProducer(&cctx_params, NULL, dummy_extern= al_sequence_producer); + return ZSTD_estimateCCtxSize_usingCCtxParams(&cctx_params); +} +EXPORT_SYMBOL(zstd_cctx_workspace_bound_with_ext_seq_prod); + +size_t zstd_cstream_workspace_bound_with_ext_seq_prod(const zstd_compressi= on_parameters *compress_params) +{ + ZSTD_CCtx_params cctx_params; + init_cctx_params_from_compress_params(&cctx_params, compress_params); + ZSTD_CCtxParams_registerSequenceProducer(&cctx_params, NULL, dummy_extern= al_sequence_producer); + return ZSTD_estimateCStreamSize_usingCCtxParams(&cctx_params); +} +EXPORT_SYMBOL(zstd_cstream_workspace_bound_with_ext_seq_prod); + zstd_cctx *zstd_init_cctx(void *workspace, size_t workspace_size) { if (workspace =3D=3D NULL) @@ -209,5 +262,25 @@ size_t zstd_end_stream(zstd_cstream *cstream, zstd_out= _buffer *output) } EXPORT_SYMBOL(zstd_end_stream); =20 +void zstd_register_sequence_producer( + zstd_cctx *cctx, + void* sequence_producer_state, + zstd_sequence_producer_f sequence_producer +) { + ZSTD_registerSequenceProducer(cctx, sequence_producer_state, sequence_pro= ducer); +} +EXPORT_SYMBOL(zstd_register_sequence_producer); + +size_t zstd_compress_sequences_and_literals(zstd_cctx *cctx, void* dst, si= ze_t dst_capacity, + const zstd_sequence *in_seqs, size_t in_seqs_size, + const void* literals, size_t lit_size, size_t lit_capacity, + size_t decompressed_size) +{ + return ZSTD_compressSequencesAndLiterals(cctx, dst, dst_capacity, in_seqs, + in_seqs_size, literals, lit_size, + lit_capacity, decompressed_size); +} +EXPORT_SYMBOL(zstd_compress_sequences_and_literals); + MODULE_LICENSE("Dual BSD/GPL"); MODULE_DESCRIPTION("Zstd Compressor"); diff --git a/lib/zstd/zstd_decompress_module.c b/lib/zstd/zstd_decompress_m= odule.c index 469fc3059be0..0ae819f0c927 100644 --- a/lib/zstd/zstd_decompress_module.c +++ b/lib/zstd/zstd_decompress_module.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause /* - * Copyright (c) Facebook, Inc. + * Copyright (c) Meta Platforms, Inc. and affiliates. * All rights reserved. * * This source code is licensed under both the BSD-style license (found in= the @@ -113,7 +113,7 @@ EXPORT_SYMBOL(zstd_init_dstream); =20 size_t zstd_reset_dstream(zstd_dstream *dstream) { - return ZSTD_resetDStream(dstream); + return ZSTD_DCtx_reset(dstream, ZSTD_reset_session_only); } EXPORT_SYMBOL(zstd_reset_dstream); =20 --=20 2.48.1