Skip to content

Instantly share code, notes, and snippets.

@nmoinvaz
Created February 18, 2026 05:42
Show Gist options
  • Select an option

  • Save nmoinvaz/a36a330dbdb5d5a82843a5614d91e41b to your computer and use it in GitHub Desktop.

Select an option

Save nmoinvaz/a36a330dbdb5d5a82843a5614d91e41b to your computer and use it in GitHub Desktop.
Zlib-ng variant matrix for PR 2139

Functable Dispatch Matrix — x86 -march Variants

Extracted by inspecting undefined symbols in functable.c.o for each build — these are the function pointers the functable actually assigns at runtime. Builds use clang -target x86_64-apple-macos with runtime CPU detection enabled (the default).

-march native features

-march SSE2 SSSE3 SSE4.1 SSE4.2 PCLMUL AVX2 AVX-512 AVX512VNNI VPCLMUL
x86-64 - - - - - - - - -
nehalem native native native native - - - - -
haswell native native native native native native - - -
skylake-avx512 native native native native native native native - -
icelake-server native native native native native native native native native

CRC32 dispatch

The CRC32 dispatch chain is the most interesting because it's affected by three compile-time flags:

  • Default: chorba SSE variants → PCLMULQDQ → VPCLMULQDQ
  • WITHOUT_CHORBA (-DWITH_CRC32_CHORBA=OFF): braid → PCLMULQDQ → VPCLMULQDQ
  • WITHOUT_CHORBA_SSE (-DWITHOUT_CHORBA_SSE): generic C chorba → PCLMULQDQ → VPCLMULQDQ

Haswell and above have PCLMULQDQ native, so CRC32_BRAID_FALLBACK is not defined, and all chorba/braid variants are excluded from both compilation and dispatch — the chorba flags have no effect.

Dispatch target x86-64 Default x86-64 WITHOUT_CHORBA x86-64 WITHOUT_CHORBA_SSE nehalem Default nehalem WITHOUT_CHORBA nehalem WITHOUT_CHORBA_SSE haswell skylake-avx512 icelake-server
crc32_braid - Y - - Y - - - -
crc32_copy_braid - Y - - Y - - - -
crc32_chorba - - Y - - Y - - -
crc32_copy_chorba - - Y - - Y - - -
crc32_chorba_sse2 Y - - - - - - - -
crc32_copy_chorba_sse2 Y - - - - - - - -
crc32_chorba_sse41 Y - - Y - - - - -
crc32_copy_chorba_sse41 Y - - Y - - - - -
crc32_pclmulqdq Y Y Y Y Y Y Y Y -
crc32_copy_pclmulqdq Y Y Y Y Y Y Y Y -
crc32_vpclmulqdq Y Y Y Y Y Y Y Y Y
crc32_copy_vpclmulqdq Y Y Y Y Y Y Y Y Y

Notes:

  • crc32_chorba_sse2 dispatch is gated by !defined(X86_SSE41_NATIVE) && !defined(X86_PCLMULQDQ_NATIVE), so nehalem (SSE4.1 native) skips it and goes straight to crc32_chorba_sse41.
  • crc32_chorba_sse41 dispatch is gated by !defined(X86_PCLMULQDQ_NATIVE), so haswell+ never dispatches to any chorba variant.
  • crc32_pclmulqdq dispatch is gated by !defined(X86_VPCLMULQDQ_NATIVE), so icelake-server skips it and dispatches directly to crc32_vpclmulqdq.

Adler32 dispatch

Dispatch target x86-64 nehalem haswell skylake-avx512 icelake-server
adler32_c Y - - - -
adler32_ssse3 Y Y - - -
adler32_avx2 Y Y Y - -
adler32_avx512 Y Y Y Y -
adler32_avx512_vnni Y Y Y Y Y
Dispatch target x86-64 nehalem haswell skylake-avx512 icelake-server
adler32_copy_c Y - - - -
adler32_copy_ssse3 Y - - - -
adler32_copy_sse42 Y Y - - -
adler32_copy_avx2 Y Y Y - -
adler32_copy_avx512 Y Y Y Y -
adler32_copy_avx512_vnni Y Y Y Y Y

Notes:

  • adler32_copy_ssse3 only appears at x86-64 baseline. At nehalem, SSE4.2 is native so adler32_copy_sse42 replaces it directly without the ssse3 intermediate.
  • adler32_c is gated by ADLER32_FALLBACK which requires !X86_SSSE3_NATIVE — only x86-64 baseline.

Compare256 dispatch

Dispatch target x86-64 nehalem haswell skylake-avx512 icelake-server
compare256_sse2 Y Y - - -
compare256_avx2 Y Y Y - -
compare256_avx512 Y Y Y Y Y

Note: SSE2 dispatch is gated by !X86_AVX2_NATIVE, so haswell+ skips it.

Chunkmemset_safe dispatch

Dispatch target x86-64 nehalem haswell skylake-avx512 icelake-server
chunkmemset_safe_sse2 Y - - - -
chunkmemset_safe_ssse3 Y Y - - -
chunkmemset_safe_avx2 Y Y Y - -
chunkmemset_safe_avx512 Y Y Y Y Y

Note: SSE2 and SSSE3 dispatch is gated by !X86_AVX2_NATIVE.

Inflate_fast dispatch

Dispatch target x86-64 nehalem haswell skylake-avx512 icelake-server
inflate_fast_sse2 Y - - - -
inflate_fast_ssse3 Y Y - - -
inflate_fast_avx2 Y Y Y - -
inflate_fast_avx512 Y Y Y Y Y

Note: Same gating as chunkmemset_safe — SSE2/SSSE3 gated by !X86_AVX2_NATIVE, AVX2 gated by !X86_AVX512_NATIVE.

Longest_match dispatch

Dispatch target x86-64 nehalem haswell skylake-avx512 icelake-server
longest_match_sse2 Y Y - - -
longest_match_slow_sse2 Y Y - - -
longest_match_avx2 Y Y Y - -
longest_match_slow_avx2 Y Y Y - -
longest_match_avx512 Y Y Y Y Y
longest_match_slow_avx512 Y Y Y Y Y

Slide_hash dispatch

Dispatch target x86-64 nehalem haswell skylake-avx512 icelake-server
slide_hash_sse2 Y Y - - -
slide_hash_avx2 Y Y Y Y Y

Note: slide_hash_avx2 is always dispatched (no higher variant exists). There is no AVX-512 slide_hash implementation.

Summary

The _NATIVE preprocessor guards in functable.c progressively eliminate lower-tier dispatch assignments as the -march level increases:

  • x86-64 baseline: All variants dispatched (full runtime detection).
  • nehalem: C fallbacks removed (adler32_c, chunkmemset_safe_c, etc.), SSE2 chunkmemset/inflate_fast removed, crc32_chorba_sse2 removed (SSE4.1 native → skip to crc32_chorba_sse41).
  • haswell: All SSE-tier dispatch removed, all chorba/braid removed (PCLMULQDQ native), only AVX2+ and PCLMULQDQ+ dispatched.
  • skylake-avx512: AVX2-tier dispatch removed (except slide_hash_avx2), only AVX-512 and VPCLMULQDQ dispatched.
  • icelake-server: Most aggressive — one variant per family. Only avx512_vnni for adler32, avx512 for everything else, vpclmulqdq for CRC32, avx2 for slide_hash.

The chorba compile flags work correctly:

  • WITHOUT_CHORBA: All chorba symbols removed from both compilation and dispatch; crc32_braid becomes the software fallback.
  • WITHOUT_CHORBA_SSE: SSE2/SSE41 chorba removed; generic C crc32_chorba remains as the software fallback.
  • Both flags are no-ops at haswell+ since PCLMULQDQ native eliminates CRC32_BRAID_FALLBACK entirely.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment