FastLED 3.9.15
Loading...
Searching...
No Matches

◆ wave8Transpose_16x4_pipe4()

FL_OPTIMIZE_FUNCTION FL_IRAM void fl::wave8Transpose_16x4_pipe4 ( const u8(&) lanes_a[16],
const u8(&) lanes_b[16],
const u8(&) lanes_c[16],
const u8(&) lanes_d[16],
const Wave8ByteExpansionLut & lut,
u8(&) output_a[16 *sizeof(Wave8Byte)],
u8(&) output_b[16 *sizeof(Wave8Byte)],
u8(&) output_c[16 *sizeof(Wave8Byte)],
u8(&) output_d[16 *sizeof(Wave8Byte)] )

Pipe4: transpose 16-lane × 4-byte-positions (#2548).

Bit-identical to four sequential wave8Transpose_16 calls. Peak of the cross-position ILP curve on RV32 P4 — pipe2 = +26%, pipe3 = +36%, pipe4 = +41% vs baseline (9 651 → 6 822 µs/frame). pipe6 / pipe8 regress to 94% (32-GPR budget exceeded → compiler spills). Measured 11% UNDER the 7 680 µs WS2812B TX target — comfortable margin for ISR-chunked streaming.

Definition at line 215 of file wave8.cpp.hpp.

223 {
224 Wave8Byte laneWaveformsA[16];
225 Wave8Byte laneWaveformsB[16];
226 Wave8Byte laneWaveformsC[16];
227 Wave8Byte laneWaveformsD[16];
228 for (int lane = 0; lane < 16; lane++) {
229 detail::wave8_expand_byte(lanes_a[lane], lut, &laneWaveformsA[lane]);
230 }
231 for (int lane = 0; lane < 16; lane++) {
232 detail::wave8_expand_byte(lanes_b[lane], lut, &laneWaveformsB[lane]);
233 }
234 for (int lane = 0; lane < 16; lane++) {
235 detail::wave8_expand_byte(lanes_c[lane], lut, &laneWaveformsC[lane]);
236 }
237 for (int lane = 0; lane < 16; lane++) {
238 detail::wave8_expand_byte(lanes_d[lane], lut, &laneWaveformsD[lane]);
239 }
240 detail::wave8_transpose_16x4_pipe4(laneWaveformsA, laneWaveformsB,
241 laneWaveformsC, laneWaveformsD,
242 output_a, output_b, output_c, output_d);
243}
FASTLED_FORCE_INLINE FL_IRAM FL_OPTIMIZE_FUNCTION void wave8_transpose_16x4_pipe4(const Wave8Byte lane_waves_a[16], const Wave8Byte lane_waves_b[16], const Wave8Byte lane_waves_c[16], const Wave8Byte lane_waves_d[16], u8 output_a[16 *sizeof(Wave8Byte)], u8 output_b[16 *sizeof(Wave8Byte)], u8 output_c[16 *sizeof(Wave8Byte)], u8 output_d[16 *sizeof(Wave8Byte)])
Pipe4: transpose 16-lane × 4-byte-positions in one fused call.
Definition wave8.hpp:273
FASTLED_FORCE_INLINE FL_IRAM FL_OPTIMIZE_FUNCTION void wave8_expand_byte(u8 byte_value, const Wave8ByteExpansionLut &lut, Wave8Byte *output)
Byte-indexed expansion (#2526): one indexed 8-byte copy.
Definition wave8.hpp:69

References FL_RESTRICT_PARAM, fl::detail::wave8_expand_byte(), and fl::detail::wave8_transpose_16x4_pipe4().

+ Here is the call graph for this function: