FastLED 3.9.15
Loading...
Searching...
No Matches

◆ wave8_transpose_16x4_pipe4()

FASTLED_FORCE_INLINE FL_IRAM FL_OPTIMIZE_FUNCTION void fl::detail::wave8_transpose_16x4_pipe4 ( const Wave8Byte lane_waves_a[16],
const Wave8Byte lane_waves_b[16],
const Wave8Byte lane_waves_c[16],
const Wave8Byte lane_waves_d[16],
u8 output_a[16 *sizeof(Wave8Byte)],
u8 output_b[16 *sizeof(Wave8Byte)],
u8 output_c[16 *sizeof(Wave8Byte)],
u8 output_d[16 *sizeof(Wave8Byte)] )

Pipe4: transpose 16-lane × 4-byte-positions in one fused call.

Bit-identical to four sequential wave8_transpose_16 calls. Extends the pipe2 idea to 4 positions — empirically the peak of the curve on the in-order RV32 P4 core (#2548). pipe2 saved 26% over baseline, pipe3 36%, pipe4 41%, pipe6 regressed to 94% (register spill). Stays within the 32-GPR budget: 4 × 4 = 16 OR-tree accumulators + ~8 misc GPRs = ~24 live; pipe6 would push this past 32.

Definition at line 273 of file wave8.hpp.

280 {
281 for (int symbol_idx = 0; symbol_idx < 8; symbol_idx++) {
282 u8 la[16];
283 u8 lb[16];
284 u8 lc[16];
285 u8 ld[16];
286 for (int lane = 0; lane < 16; lane++) {
287 la[lane] = lane_waves_a[lane].symbols[symbol_idx].data;
288 lb[lane] = lane_waves_b[lane].symbols[symbol_idx].data;
289 lc[lane] = lane_waves_c[lane].symbols[symbol_idx].data;
290 ld[lane] = lane_waves_d[lane].symbols[symbol_idx].data;
291 }
292 spread_transpose16_symbol(la, output_a + symbol_idx * 16);
293 spread_transpose16_symbol(lb, output_b + symbol_idx * 16);
294 spread_transpose16_symbol(lc, output_c + symbol_idx * 16);
295 spread_transpose16_symbol(ld, output_d + symbol_idx * 16);
296 }
297}
FASTLED_FORCE_INLINE FL_IRAM FL_OPTIMIZE_FUNCTION void spread_transpose16_symbol(const u8 l[16], u8 out[16])
Transpose one symbol of 16 lanes (16 input bytes) into 16 output bytes: 8 pulses × 2 bytes,...
unsigned char u8
Definition stdint.h:131

References spread_transpose16_symbol().

Referenced by fl::wave8Transpose_16x4_pipe4().

+ Here is the call graph for this function:
+ Here is the caller graph for this function: