|
FastLED 3.9.15
|
| FL_OPTIMIZE_FUNCTION FL_IRAM void fl::wave8Transpose_16x4_pipe4 | ( | const u8(&) | lanes_a[16], |
| const u8(&) | lanes_b[16], | ||
| const u8(&) | lanes_c[16], | ||
| const u8(&) | lanes_d[16], | ||
| const Wave8ByteExpansionLut & | lut, | ||
| u8(&) | output_a[16 *sizeof(Wave8Byte)], | ||
| u8(&) | output_b[16 *sizeof(Wave8Byte)], | ||
| u8(&) | output_c[16 *sizeof(Wave8Byte)], | ||
| u8(&) | output_d[16 *sizeof(Wave8Byte)] ) |
Pipe4: transpose 16-lane × 4-byte-positions (#2548).
Bit-identical to four sequential wave8Transpose_16 calls. Peak of the cross-position ILP curve on RV32 P4 — pipe2 = +26%, pipe3 = +36%, pipe4 = +41% vs baseline (9 651 → 6 822 µs/frame). pipe6 / pipe8 regress to 94% (32-GPR budget exceeded → compiler spills). Measured 11% UNDER the 7 680 µs WS2812B TX target — comfortable margin for ISR-chunked streaming.
Definition at line 215 of file wave8.cpp.hpp.
References FL_RESTRICT_PARAM, fl::detail::wave8_expand_byte(), and fl::detail::wave8_transpose_16x4_pipe4().
Here is the call graph for this function: