|
FastLED 3.9.15
|
| FL_OPTIMIZE_FUNCTION FL_IRAM void fl::wave8Transpose_16x2_pipe2 | ( | const u8(&) | lanes_a[16], |
| const u8(&) | lanes_b[16], | ||
| const Wave8ByteExpansionLut & | lut, | ||
| u8(&) | output_a[16 *sizeof(Wave8Byte)], | ||
| u8(&) | output_b[16 *sizeof(Wave8Byte)] ) |
Pipe2: transpose 16-lane × 2-byte-positions (#2548).
Bit-identical to two sequential wave8Transpose_16 calls. Internally interleaves the two independent OR-trees inside the symbol loop so the in-order RV32 P4 can fill load-use stalls from position A with ALU ops from position B. Measured 9655 → 7625 µs/frame (+26%) on P4 v1.3 (16-lane × 256-LED, byte-LUT path) — first variant to beat the 7680 µs WS2812B 16-lane TX target.
Definition at line 136 of file wave8.cpp.hpp.
References FL_RESTRICT_PARAM, fl::detail::wave8_expand_byte(), and fl::detail::wave8_transpose_16x2_pipe2().
Here is the call graph for this function: