Functable Dispatch Matrix — x86 `-march` Variants

Extracted by inspecting undefined symbols in functable.c.o for each build — these are the function pointers the functable actually assigns at runtime. Builds use clang -target x86_64-apple-macos with runtime CPU detection enabled (the default).

`-march` native features

`-march`	SSE2	SSSE3	SSE4.1	SSE4.2	PCLMUL	AVX2	AVX-512	AVX512VNNI	VPCLMUL
x86-64	-	-	-	-	-	-	-	-	-
nehalem	native	native	native	native	-	-	-	-	-
haswell	native	native	native	native	native	native	-	-	-
skylake-avx512	native	native	native	native	native	native	native	-	-
icelake-server	native	native	native	native	native	native	native	native	native

CRC32 dispatch

The CRC32 dispatch chain is the most interesting because it's affected by three compile-time flags:

Default: chorba SSE variants → PCLMULQDQ → VPCLMULQDQ
WITHOUT_CHORBA (-DWITH_CRC32_CHORBA=OFF): braid → PCLMULQDQ → VPCLMULQDQ
WITHOUT_CHORBA_SSE (-DWITHOUT_CHORBA_SSE): generic C chorba → PCLMULQDQ → VPCLMULQDQ

Haswell and above have PCLMULQDQ native, so CRC32_BRAID_FALLBACK is not defined, and all chorba/braid variants are excluded from both compilation and dispatch — the chorba flags have no effect.

Dispatch target	x86-64 Default	x86-64 WITHOUT_CHORBA	x86-64 WITHOUT_CHORBA_SSE	nehalem Default	nehalem WITHOUT_CHORBA	nehalem WITHOUT_CHORBA_SSE	haswell	skylake-avx512	icelake-server
`crc32_braid`	-	Y	-	-	Y	-	-	-	-
`crc32_copy_braid`	-	Y	-	-	Y	-	-	-	-
`crc32_chorba`	-	-	Y	-	-	Y	-	-	-
`crc32_copy_chorba`	-	-	Y	-	-	Y	-	-	-
`crc32_chorba_sse2`	Y	-	-	-	-	-	-	-	-
`crc32_copy_chorba_sse2`	Y	-	-	-	-	-	-	-	-
`crc32_chorba_sse41`	Y	-	-	Y	-	-	-	-	-
`crc32_copy_chorba_sse41`	Y	-	-	Y	-	-	-	-	-
`crc32_pclmulqdq`	Y	Y	Y	Y	Y	Y	Y	Y	-
`crc32_copy_pclmulqdq`	Y	Y	Y	Y	Y	Y	Y	Y	-
`crc32_vpclmulqdq`	Y	Y	Y	Y	Y	Y	Y	Y	Y
`crc32_copy_vpclmulqdq`	Y	Y	Y	Y	Y	Y	Y	Y	Y

Notes:

crc32_chorba_sse2 dispatch is gated by !defined(X86_SSE41_NATIVE) && !defined(X86_PCLMULQDQ_NATIVE), so nehalem (SSE4.1 native) skips it and goes straight to crc32_chorba_sse41.
crc32_chorba_sse41 dispatch is gated by !defined(X86_PCLMULQDQ_NATIVE), so haswell+ never dispatches to any chorba variant.
crc32_pclmulqdq dispatch is gated by !defined(X86_VPCLMULQDQ_NATIVE), so icelake-server skips it and dispatches directly to crc32_vpclmulqdq.

Adler32 dispatch

Dispatch target	x86-64	nehalem	haswell	skylake-avx512	icelake-server
`adler32_c`	Y	-	-	-	-
`adler32_ssse3`	Y	Y	-	-	-
`adler32_avx2`	Y	Y	Y	-	-
`adler32_avx512`	Y	Y	Y	Y	-
`adler32_avx512_vnni`	Y	Y	Y	Y	Y

Dispatch target	x86-64	nehalem	haswell	skylake-avx512	icelake-server
`adler32_copy_c`	Y	-	-	-	-
`adler32_copy_ssse3`	Y	-	-	-	-
`adler32_copy_sse42`	Y	Y	-	-	-
`adler32_copy_avx2`	Y	Y	Y	-	-
`adler32_copy_avx512`	Y	Y	Y	Y	-
`adler32_copy_avx512_vnni`	Y	Y	Y	Y	Y

Notes:

adler32_copy_ssse3 only appears at x86-64 baseline. At nehalem, SSE4.2 is native so adler32_copy_sse42 replaces it directly without the ssse3 intermediate.
adler32_c is gated by ADLER32_FALLBACK which requires !X86_SSSE3_NATIVE — only x86-64 baseline.

Compare256 dispatch

Dispatch target	x86-64	nehalem	haswell	skylake-avx512	icelake-server
`compare256_sse2`	Y	Y	-	-	-
`compare256_avx2`	Y	Y	Y	-	-
`compare256_avx512`	Y	Y	Y	Y	Y

Note: SSE2 dispatch is gated by !X86_AVX2_NATIVE, so haswell+ skips it.

Chunkmemset_safe dispatch

Dispatch target	x86-64	nehalem	haswell	skylake-avx512	icelake-server
`chunkmemset_safe_sse2`	Y	-	-	-	-
`chunkmemset_safe_ssse3`	Y	Y	-	-	-
`chunkmemset_safe_avx2`	Y	Y	Y	-	-
`chunkmemset_safe_avx512`	Y	Y	Y	Y	Y

Note: SSE2 and SSSE3 dispatch is gated by !X86_AVX2_NATIVE.

Inflate_fast dispatch

Dispatch target	x86-64	nehalem	haswell	skylake-avx512	icelake-server
`inflate_fast_sse2`	Y	-	-	-	-
`inflate_fast_ssse3`	Y	Y	-	-	-
`inflate_fast_avx2`	Y	Y	Y	-	-
`inflate_fast_avx512`	Y	Y	Y	Y	Y

Note: Same gating as chunkmemset_safe — SSE2/SSSE3 gated by !X86_AVX2_NATIVE, AVX2 gated by !X86_AVX512_NATIVE.

Longest_match dispatch

Dispatch target	x86-64	nehalem	haswell	skylake-avx512	icelake-server
`longest_match_sse2`	Y	Y	-	-	-
`longest_match_slow_sse2`	Y	Y	-	-	-
`longest_match_avx2`	Y	Y	Y	-	-
`longest_match_slow_avx2`	Y	Y	Y	-	-
`longest_match_avx512`	Y	Y	Y	Y	Y
`longest_match_slow_avx512`	Y	Y	Y	Y	Y

Slide_hash dispatch

Dispatch target	x86-64	nehalem	haswell	skylake-avx512	icelake-server
`slide_hash_sse2`	Y	Y	-	-	-
`slide_hash_avx2`	Y	Y	Y	Y	Y

Note: slide_hash_avx2 is always dispatched (no higher variant exists). There is no AVX-512 slide_hash implementation.

Summary

The _NATIVE preprocessor guards in functable.c progressively eliminate lower-tier dispatch assignments as the -march level increases:

x86-64 baseline: All variants dispatched (full runtime detection).
nehalem: C fallbacks removed (adler32_c, chunkmemset_safe_c, etc.), SSE2 chunkmemset/inflate_fast removed, crc32_chorba_sse2 removed (SSE4.1 native → skip to crc32_chorba_sse41).
haswell: All SSE-tier dispatch removed, all chorba/braid removed (PCLMULQDQ native), only AVX2+ and PCLMULQDQ+ dispatched.
skylake-avx512: AVX2-tier dispatch removed (except slide_hash_avx2), only AVX-512 and VPCLMULQDQ dispatched.
icelake-server: Most aggressive — one variant per family. Only avx512_vnni for adler32, avx512 for everything else, vpclmulqdq for CRC32, avx2 for slide_hash.

The chorba compile flags work correctly:

WITHOUT_CHORBA: All chorba symbols removed from both compilation and dispatch; crc32_braid becomes the software fallback.
WITHOUT_CHORBA_SSE: SSE2/SSE41 chorba removed; generic C crc32_chorba remains as the software fallback.
Both flags are no-ops at haswell+ since PCLMULQDQ native eliminates CRC32_BRAID_FALLBACK entirely.

nmoinvaz/variant-matrix-pr-2139.md

Select an option

No results found

Select an option

No results found

Functable Dispatch Matrix — x86 `-march` Variants

`-march` native features

CRC32 dispatch

Adler32 dispatch

Compare256 dispatch

Chunkmemset_safe dispatch

Inflate_fast dispatch

Longest_match dispatch

Slide_hash dispatch

Summary

nmoinvaz/variant-matrix-pr-2139.md

Functable Dispatch Matrix — x86 -march Variants

-march native features

CRC32 dispatch

Adler32 dispatch

Compare256 dispatch

Chunkmemset_safe dispatch

Inflate_fast dispatch

Longest_match dispatch

Slide_hash dispatch

Summary

Functable Dispatch Matrix — x86 `-march` Variants

`-march` native features