|
FastLED 3.9.15
|
| FASTLED_FORCE_INLINE FL_IRAM FL_OPTIMIZE_FUNCTION void fl::detail::wave8_transpose_16 | ( | const Wave8Byte | lane_waves[16], |
| u8 | output[16 *sizeof(Wave8Byte)] ) |
Transpose 16 lanes of Wave8Byte data into interleaved format.
| lane_waves | Array of 16 Wave8Byte structures |
| output | Output buffer (128 bytes) |
Spread-LUT transpose (#2533): ~1.98× faster than the unrolled naive on the ESP32-P4 (RV32) — 6649→3353 us/frame, bit-exact. Two-pass (two 8x8 via u32), u64 SWAR, and Hacker's-Delight all lost to this on the in-order core; the winning shape is independent table-lookup + shift + OR-reduce (native u32, no dependency chain). See bit_spread_lut.hpp. Each 16-lane sample is 2 output bytes: low = lanes 0-7, high = lanes 8-15.
Definition at line 231 of file wave8.hpp.
References spread_transpose16_symbol().
Referenced by fl::wave8Transpose_16(), and fl::wave8Transpose_16().
Here is the call graph for this function:
Here is the caller graph for this function: