Extracted by inspecting undefined symbols in functable.c.o for each build — these are the function pointers the functable actually assigns at runtime. Builds use clang -target x86_64-apple-macos with runtime CPU detection enabled (the default).
-march |
SSE2 | SSSE3 | SSE4.1 | SSE4.2 | PCLMUL | AVX2 | AVX-512 | AVX512VNNI | VPCLMUL |
|---|---|---|---|---|---|---|---|---|---|
| x86-64 | - | - | - | - | - | - | - | - | - |
| nehalem | native | native | native | native | - | - | - | - | - |
| haswell | native | native | native | native | native | native | - | - | - |
| skylake-avx512 | native | native | native | native | native | native | native | - | - |
| icelake-server | native | native | native | native | native | native | native | native | native |
The CRC32 dispatch chain is the most interesting because it's affected by three compile-time flags:
- Default: chorba SSE variants → PCLMULQDQ → VPCLMULQDQ
- WITHOUT_CHORBA (
-DWITH_CRC32_CHORBA=OFF): braid → PCLMULQDQ → VPCLMULQDQ - WITHOUT_CHORBA_SSE (
-DWITHOUT_CHORBA_SSE): generic C chorba → PCLMULQDQ → VPCLMULQDQ
Haswell and above have PCLMULQDQ native, so CRC32_BRAID_FALLBACK is not defined, and all chorba/braid variants are excluded from both compilation and dispatch — the chorba flags have no effect.
| Dispatch target | x86-64 Default | x86-64 WITHOUT_CHORBA | x86-64 WITHOUT_CHORBA_SSE | nehalem Default | nehalem WITHOUT_CHORBA | nehalem WITHOUT_CHORBA_SSE | haswell | skylake-avx512 | icelake-server |
|---|---|---|---|---|---|---|---|---|---|
crc32_braid |
- | Y | - | - | Y | - | - | - | - |
crc32_copy_braid |
- | Y | - | - | Y | - | - | - | - |
crc32_chorba |
- | - | Y | - | - | Y | - | - | - |
crc32_copy_chorba |
- | - | Y | - | - | Y | - | - | - |
crc32_chorba_sse2 |
Y | - | - | - | - | - | - | - | - |
crc32_copy_chorba_sse2 |
Y | - | - | - | - | - | - | - | - |
crc32_chorba_sse41 |
Y | - | - | Y | - | - | - | - | - |
crc32_copy_chorba_sse41 |
Y | - | - | Y | - | - | - | - | - |
crc32_pclmulqdq |
Y | Y | Y | Y | Y | Y | Y | Y | - |
crc32_copy_pclmulqdq |
Y | Y | Y | Y | Y | Y | Y | Y | - |
crc32_vpclmulqdq |
Y | Y | Y | Y | Y | Y | Y | Y | Y |
crc32_copy_vpclmulqdq |
Y | Y | Y | Y | Y | Y | Y | Y | Y |
Notes:
crc32_chorba_sse2dispatch is gated by!defined(X86_SSE41_NATIVE) && !defined(X86_PCLMULQDQ_NATIVE), so nehalem (SSE4.1 native) skips it and goes straight tocrc32_chorba_sse41.crc32_chorba_sse41dispatch is gated by!defined(X86_PCLMULQDQ_NATIVE), so haswell+ never dispatches to any chorba variant.crc32_pclmulqdqdispatch is gated by!defined(X86_VPCLMULQDQ_NATIVE), so icelake-server skips it and dispatches directly tocrc32_vpclmulqdq.
| Dispatch target | x86-64 | nehalem | haswell | skylake-avx512 | icelake-server |
|---|---|---|---|---|---|
adler32_c |
Y | - | - | - | - |
adler32_ssse3 |
Y | Y | - | - | - |
adler32_avx2 |
Y | Y | Y | - | - |
adler32_avx512 |
Y | Y | Y | Y | - |
adler32_avx512_vnni |
Y | Y | Y | Y | Y |
| Dispatch target | x86-64 | nehalem | haswell | skylake-avx512 | icelake-server |
|---|---|---|---|---|---|
adler32_copy_c |
Y | - | - | - | - |
adler32_copy_ssse3 |
Y | - | - | - | - |
adler32_copy_sse42 |
Y | Y | - | - | - |
adler32_copy_avx2 |
Y | Y | Y | - | - |
adler32_copy_avx512 |
Y | Y | Y | Y | - |
adler32_copy_avx512_vnni |
Y | Y | Y | Y | Y |
Notes:
adler32_copy_ssse3only appears at x86-64 baseline. At nehalem, SSE4.2 is native soadler32_copy_sse42replaces it directly without the ssse3 intermediate.adler32_cis gated byADLER32_FALLBACKwhich requires!X86_SSSE3_NATIVE— only x86-64 baseline.
| Dispatch target | x86-64 | nehalem | haswell | skylake-avx512 | icelake-server |
|---|---|---|---|---|---|
compare256_sse2 |
Y | Y | - | - | - |
compare256_avx2 |
Y | Y | Y | - | - |
compare256_avx512 |
Y | Y | Y | Y | Y |
Note: SSE2 dispatch is gated by !X86_AVX2_NATIVE, so haswell+ skips it.
| Dispatch target | x86-64 | nehalem | haswell | skylake-avx512 | icelake-server |
|---|---|---|---|---|---|
chunkmemset_safe_sse2 |
Y | - | - | - | - |
chunkmemset_safe_ssse3 |
Y | Y | - | - | - |
chunkmemset_safe_avx2 |
Y | Y | Y | - | - |
chunkmemset_safe_avx512 |
Y | Y | Y | Y | Y |
Note: SSE2 and SSSE3 dispatch is gated by !X86_AVX2_NATIVE.
| Dispatch target | x86-64 | nehalem | haswell | skylake-avx512 | icelake-server |
|---|---|---|---|---|---|
inflate_fast_sse2 |
Y | - | - | - | - |
inflate_fast_ssse3 |
Y | Y | - | - | - |
inflate_fast_avx2 |
Y | Y | Y | - | - |
inflate_fast_avx512 |
Y | Y | Y | Y | Y |
Note: Same gating as chunkmemset_safe — SSE2/SSSE3 gated by !X86_AVX2_NATIVE, AVX2 gated by !X86_AVX512_NATIVE.
| Dispatch target | x86-64 | nehalem | haswell | skylake-avx512 | icelake-server |
|---|---|---|---|---|---|
longest_match_sse2 |
Y | Y | - | - | - |
longest_match_slow_sse2 |
Y | Y | - | - | - |
longest_match_avx2 |
Y | Y | Y | - | - |
longest_match_slow_avx2 |
Y | Y | Y | - | - |
longest_match_avx512 |
Y | Y | Y | Y | Y |
longest_match_slow_avx512 |
Y | Y | Y | Y | Y |
| Dispatch target | x86-64 | nehalem | haswell | skylake-avx512 | icelake-server |
|---|---|---|---|---|---|
slide_hash_sse2 |
Y | Y | - | - | - |
slide_hash_avx2 |
Y | Y | Y | Y | Y |
Note: slide_hash_avx2 is always dispatched (no higher variant exists). There is no AVX-512 slide_hash implementation.
The _NATIVE preprocessor guards in functable.c progressively eliminate lower-tier dispatch assignments as the -march level increases:
- x86-64 baseline: All variants dispatched (full runtime detection).
- nehalem: C fallbacks removed (
adler32_c,chunkmemset_safe_c, etc.), SSE2 chunkmemset/inflate_fast removed,crc32_chorba_sse2removed (SSE4.1 native → skip tocrc32_chorba_sse41). - haswell: All SSE-tier dispatch removed, all chorba/braid removed (PCLMULQDQ native), only AVX2+ and PCLMULQDQ+ dispatched.
- skylake-avx512: AVX2-tier dispatch removed (except
slide_hash_avx2), only AVX-512 and VPCLMULQDQ dispatched. - icelake-server: Most aggressive — one variant per family. Only
avx512_vnnifor adler32,avx512for everything else,vpclmulqdqfor CRC32,avx2for slide_hash.
The chorba compile flags work correctly:
- WITHOUT_CHORBA: All chorba symbols removed from both compilation and dispatch;
crc32_braidbecomes the software fallback. - WITHOUT_CHORBA_SSE: SSE2/SSE41 chorba removed; generic C
crc32_chorbaremains as the software fallback. - Both flags are no-ops at haswell+ since PCLMULQDQ native eliminates
CRC32_BRAID_FALLBACKentirely.