Hoist s->bi_buf and s->bi_valid into local variables in compress_block() and pass them by pointer to the emit functions. This eliminates redundant load/store pairs between zng_emit_lit and zng_emit_dist calls within the main compression loop.
| Architecture | Develop (baseline) | HEAD (registers) | Saved |
|---|---|---|---|
| AArch64 | 4 loads + 4 stores = 8 | 2 loads + 2 stores = 4 | 4 fewer mem ops |
| x86-64 | 4 loads + 4 stores = 8 | 2 loads + 2 stores = 4 | 4 fewer mem ops |
Total instruction count is unchanged on both architectures (169 AArch64, 166 x86-64). The removed load/store pairs are replaced by the compiler keeping values in registers at zero instruction cost.
In develop, each emit function loaded bi_buf/bi_valid from the struct at entry and stored them back at exit. Between consecutive calls in the loop (e.g. zng_emit_lit → zng_emit_dist), this created a store-then-reload round-trip through memory for values that were already in registers.
In HEAD, only 2 accesses remain per field: 1 load at function entry, 1 store at function exit.
; mid-loop store-back after emit_lit
str w3, [x0, #176] ; STORE bi_valid
str x17, [x0, #168] ; STORE bi_buf
; mid-loop reload for emit_dist
ldr w4, [x0, #176] ; LOAD bi_valid
ldr x5, [x0, #168] ; LOAD bi_buf
; mid-loop store-back after emit_dist / reload for emit_end_block
ldr w3, [x0, #176] ; LOAD bi_valid
ldr x17, [x0, #168] ; LOAD bi_buf
; final write-back
str w9, [x0, #176] ; STORE bi_valid
str x8, [x0, #168] ; STORE bi_buf; function entry — load once
ldr x4, [x0, #168] ; LOAD bi_buf
ldr w3, [x0, #176] ; LOAD bi_valid
; function exit — store once
str x8, [x0, #168] ; STORE bi_buf
str w9, [x0, #176] ; STORE bi_valid; mid-loop store-back after emit_lit
movl %r8d, 176(%rdi) ; STORE bi_valid
movq %r12, 168(%rdi) ; STORE bi_buf
; mid-loop reload for emit_dist
movl 176(%rdi), %eax ; LOAD bi_valid
movq 168(%rdi), %r13 ; LOAD bi_buf
; mid-loop store-back after emit_dist / reload for emit_end_block
movl 176(%rdi), %r8d ; LOAD bi_valid
movq 168(%rdi), %r12 ; LOAD bi_buf
; final write-back
movl %edx, 176(%rdi) ; STORE bi_valid
movq %rax, 168(%rdi) ; STORE bi_buf; function entry — load once
movq 168(%rdi), %r13 ; LOAD bi_buf
movl 176(%rdi), %eax ; LOAD bi_valid
; function exit — store once
movq %rax, 168(%rdi) ; STORE bi_buf
movl %edx, 176(%rdi) ; STORE bi_validCompiled with clang -O2 -std=c11 -DDISABLE_RUNTIME_CPU_DETECTION -DNDEBUG:
- AArch64:
-arch arm64(Apple clang, native) - x86-64:
-target x86_64-apple-macos -march=x86-64-v2