FastLED 3.9.15
Loading...
Searching...
No Matches

◆ wave8_transpose_16x2_pipe2()

FASTLED_FORCE_INLINE FL_IRAM FL_OPTIMIZE_FUNCTION void fl::detail::wave8_transpose_16x2_pipe2 ( const Wave8Byte lane_waves_a[16],
const Wave8Byte lane_waves_b[16],
u8 output_a[16 *sizeof(Wave8Byte)],
u8 output_b[16 *sizeof(Wave8Byte)] )

Pipe2: transpose 16-lane × 2-byte-positions in one fused call.

Result is bit-identical to two sequential wave8_transpose_16 calls; the win comes from interleaving the two independent OR-trees inside the symbol loop so the in-order RV32 P4 can fill load-use stall cycles from position A with ALU ops from position B (and vice versa). Measured +26% / frame vs sequential calls on P4 v1.3 (#2548).

Definition at line 249 of file wave8.hpp.

252 {
253 for (int symbol_idx = 0; symbol_idx < 8; symbol_idx++) {
254 u8 la[16];
255 u8 lb[16];
256 for (int lane = 0; lane < 16; lane++) {
257 la[lane] = lane_waves_a[lane].symbols[symbol_idx].data;
258 lb[lane] = lane_waves_b[lane].symbols[symbol_idx].data;
259 }
260 spread_transpose16_symbol(la, output_a + symbol_idx * 16);
261 spread_transpose16_symbol(lb, output_b + symbol_idx * 16);
262 }
263}
FASTLED_FORCE_INLINE FL_IRAM FL_OPTIMIZE_FUNCTION void spread_transpose16_symbol(const u8 l[16], u8 out[16])
Transpose one symbol of 16 lanes (16 input bytes) into 16 output bytes: 8 pulses × 2 bytes,...
unsigned char u8
Definition stdint.h:131

References spread_transpose16_symbol().

Referenced by fl::wave8Transpose_16x2_pipe2().

+ Here is the call graph for this function:
+ Here is the caller graph for this function: