|
FastLED 3.9.15
|
| FASTLED_FORCE_INLINE FL_IRAM FL_OPTIMIZE_FUNCTION void fl::detail::wave8_transpose_16x2_pipe2 | ( | const Wave8Byte | lane_waves_a[16], |
| const Wave8Byte | lane_waves_b[16], | ||
| u8 | output_a[16 *sizeof(Wave8Byte)], | ||
| u8 | output_b[16 *sizeof(Wave8Byte)] ) |
Pipe2: transpose 16-lane × 2-byte-positions in one fused call.
Result is bit-identical to two sequential wave8_transpose_16 calls; the win comes from interleaving the two independent OR-trees inside the symbol loop so the in-order RV32 P4 can fill load-use stall cycles from position A with ALU ops from position B (and vice versa). Measured +26% / frame vs sequential calls on P4 v1.3 (#2548).
Definition at line 249 of file wave8.hpp.
References spread_transpose16_symbol().
Referenced by fl::wave8Transpose_16x2_pipe2().
Here is the call graph for this function:
Here is the caller graph for this function: