|
FastLED 3.9.15
|
| FASTLED_FORCE_INLINE FL_IRAM FL_OPTIMIZE_FUNCTION void fl::detail::wave8_transpose_16x4_bf1_pipe4 | ( | const u8 | lanes_a[16], |
| const u8 | lanes_b[16], | ||
| const u8 | lanes_c[16], | ||
| const u8 | lanes_d[16], | ||
| u8 | W0, | ||
| u8 | W1, | ||
| u8 | output_a[16 *sizeof(Wave8Byte)], | ||
| u8 | output_b[16 *sizeof(Wave8Byte)], | ||
| u8 | output_c[16 *sizeof(Wave8Byte)], | ||
| u8 | output_d[16 *sizeof(Wave8Byte)] ) |
BF1 + pipe4: 4-position software-pipelined BF1 (#2548 deep-dive).
Combines BF1's algorithmic reduction (1 transpose per byte-position instead of 8) with pipe4's cross-position ILP. Empirical peak of all prototypes: 1 757 µs/frame vs 9 651 baseline (5.49×).
Definition at line 464 of file wave8.hpp.
References spread_transpose16_symbol(), fl::W0, and fl::W1.
Referenced by fl::wave8Transpose_16x4_bf1_pipe4().
Here is the call graph for this function:
Here is the caller graph for this function: