FastLED 3.9.15
Loading...
Searching...
No Matches

◆ wave8_transpose_16()

FASTLED_FORCE_INLINE FL_IRAM FL_OPTIMIZE_FUNCTION void fl::detail::wave8_transpose_16 ( const Wave8Byte lane_waves[16],
u8 output[16 *sizeof(Wave8Byte)] )

Transpose 16 lanes of Wave8Byte data into interleaved format.

Parameters
lane_wavesArray of 16 Wave8Byte structures
outputOutput buffer (128 bytes)

Spread-LUT transpose (#2533): ~1.98× faster than the unrolled naive on the ESP32-P4 (RV32) — 6649→3353 us/frame, bit-exact. Two-pass (two 8x8 via u32), u64 SWAR, and Hacker's-Delight all lost to this on the in-order core; the winning shape is independent table-lookup + shift + OR-reduce (native u32, no dependency chain). See bit_spread_lut.hpp. Each 16-lane sample is 2 output bytes: low = lanes 0-7, high = lanes 8-15.

Definition at line 231 of file wave8.hpp.

232 {
233 for (int symbol_idx = 0; symbol_idx < 8; symbol_idx++) {
234 u8 l[16];
235 for (int lane = 0; lane < 16; lane++) {
236 l[lane] = lane_waves[lane].symbols[symbol_idx].data;
237 }
238 spread_transpose16_symbol(l, output + symbol_idx * 16);
239 }
240}
FASTLED_FORCE_INLINE FL_IRAM FL_OPTIMIZE_FUNCTION void spread_transpose16_symbol(const u8 l[16], u8 out[16])
Transpose one symbol of 16 lanes (16 input bytes) into 16 output bytes: 8 pulses × 2 bytes,...
unsigned char u8
Definition stdint.h:131

References spread_transpose16_symbol().

Referenced by fl::wave8Transpose_16(), and fl::wave8Transpose_16().

+ Here is the call graph for this function:
+ Here is the caller graph for this function: