New CompressionAlgorithm that emits a standard Zstandard frame: zero/0xFF runs
become RLE_Blocks (like BSHUF_ZSTD_RLE) and literal regions become
Compressed_Blocks with per-block adaptive Huffman literals and no sequences
(Number_of_Sequences=0). Short runs are absorbed into the literal stream;
incompressible literals fall back to Raw_Blocks so the worst case stays within
ZSTD_compressBound.
The Huffman tree + bitstream are produced by zstd's own HUF_compress{1,4}X_repeat
(the same calls ZSTD_compressLiterals uses); only the frame/block/literals-section
framing is hand-written, with comments citing zstd_compression_format.md so it can
be checked clause by clause. Output decodes with stock ZSTD_decompress, so no
reader changes are needed (decode routes like BSHUF_ZSTD).
On sparse diffraction this gives ~12% smaller files than bitshuffle/LZ4 at about
the same end-to-end speed, sitting between LZ4 and full ZSTD; for maximum ratio
use BSHUF_ZSTD. Robust on any input: tests round-trip pure zeros, Poisson(10),
Mersenne-Twister noise (checked against the size bound), an extreme-sparsity mask,
and a real lyso image through stock ZSTD_decompress.
API: exposed as "bszstd_rlehuf"; regenerate the Python/TS clients (update_version.sh)
to surface the new value there.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
40 lines
2.1 KiB
C++
40 lines
2.1 KiB
C++
// SPDX-FileCopyrightText: 2024 Filip Leonarski, Paul Scherrer Institute <filip.leonarski@psi.ch>
|
|
// SPDX-License-Identifier: GPL-3.0-only
|
|
|
|
#pragma once
|
|
|
|
#include <cstdint>
|
|
#include <cstddef>
|
|
#include <vector>
|
|
|
|
// Produces a STANDARD Zstandard frame from bitshuffled data, decodable by stock ZSTD_decompress:
|
|
// - zero / 0xFF runs -> RLE_Blocks (cheap, like JFJochZstdCompressor)
|
|
// - literal regions -> Compressed_Blocks with Huffman literals (no sequences); short
|
|
// runs are absorbed into the literal stream
|
|
// - incompressible literals -> Raw_Blocks (bounded worst case)
|
|
// Faster than full ZSTD (no match search) and better ratio than the plain RLE compressor (it
|
|
// entropy-codes the literals). The Huffman table is built/reused per block from that block's own
|
|
// literals via zstd's HUF_compress*X_repeat, so it is robust on any input (random, Poisson, masks,
|
|
// zeros) with no trained tables.
|
|
class JFJochZstdHuffCompressor {
|
|
std::vector<uint8_t> out; // assembled frame
|
|
std::vector<uint8_t> literals; // literal bytes in stream order (incl. absorbed short runs)
|
|
std::vector<uint8_t> hufbuf; // scratch for one Huffman-coded literal chunk
|
|
std::vector<size_t> ctable; // HUF_CElt[] (size_t-aligned)
|
|
std::vector<uint64_t> entwksp; // HUF compression workspace
|
|
unsigned repeat_state = 0; // HUF_repeat across literal blocks within the current frame
|
|
|
|
struct Seg { uint8_t type; size_t bytes; size_t lit_off; }; // type 0=run0, 1=runFF, 2=literals
|
|
std::vector<Seg> segs;
|
|
|
|
void put_le(uint64_t v, int nbytes);
|
|
size_t blk_hdr(uint32_t type, uint32_t size);
|
|
void emit_run(uint8_t value, size_t nbytes, size_t &last_off);
|
|
void emit_lit_chunk(const uint8_t *lits, size_t n, size_t &last_off);
|
|
public:
|
|
JFJochZstdHuffCompressor();
|
|
// src = bitshuffled block (src_size bytes, a multiple of 8). Writes one zstd frame to dst and
|
|
// returns its size. dst must hold at least ZSTD_compressBound(src_size) + 12 bytes.
|
|
size_t Compress(uint8_t *dst, const uint64_t *src, size_t src_size);
|
|
};
|