◆ wave8Transpose_16x4_bf1_pipe4()

FL_OPTIMIZE_FUNCTION FL_IRAM void fl::wave8Transpose_16x4_bf1_pipe4	(	const u8(&)	lanes_a[16],
		const u8(&)	lanes_b[16],
		const u8(&)	lanes_c[16],
		const u8(&)	lanes_d[16],
		const Wave8ByteExpansionLut &	lut,
		u8(&)	output_a[16 *sizeof(Wave8Byte)],
		u8(&)	output_b[16 *sizeof(Wave8Byte)],
		u8(&)	output_c[16 *sizeof(Wave8Byte)],
		u8(&)	output_d[16 *sizeof(Wave8Byte)] )

BF1 + pipe4: 4-position-pipelined direct encode (#2548 deep-dive).

Combines BF1's algorithmic reduction with pipe4 cross-position ILP. Empirical peak of all prototypes: 9 651 → 1 757 µs/frame = 5.49× speedup on P4 v1.3 16-lane × 256 LEDs. 4.4× faster than the 7 680 µs WS2812B 16-lane TX target — encode now has massive headroom for ISR-driven chunked streaming.

Definition at line 198 of file wave8.cpp.hpp.

                                                                                             {
    const u8 W0 = lut.lut[0x00].symbols[0].data;
    const u8 W1 = lut.lut[0xFF].symbols[0].data;
    detail::wave8_transpose_16x4_bf1_pipe4(lanes_a, lanes_b, lanes_c, lanes_d,
                                            W0, W1,
                                            output_a, output_b, output_c, output_d);
}

References FL_RESTRICT_PARAM, W0, W1, and fl::detail::wave8_transpose_16x4_bf1_pipe4().

Here is the call graph for this function: