◆ wave8_transpose_16()

FASTLED_FORCE_INLINE FL_IRAM FL_OPTIMIZE_FUNCTION void fl::detail::wave8_transpose_16	(	const Wave8Byte	lane_waves[16],
		u8	output[16 *sizeof(Wave8Byte)] )

Transpose 16 lanes of Wave8Byte data into interleaved format.

Parameters

lane_waves	Array of 16 Wave8Byte structures
output	Output buffer (128 bytes)

Spread-LUT transpose (#2533): ~1.98× faster than the unrolled naive on the ESP32-P4 (RV32) — 6649→3353 us/frame, bit-exact. Two-pass (two 8x8 via u32), u64 SWAR, and Hacker's-Delight all lost to this on the in-order core; the winning shape is independent table-lookup + shift + OR-reduce (native u32, no dependency chain). See bit_spread_lut.hpp. Each 16-lane sample is 2 output bytes: low = lanes 0-7, high = lanes 8-15.

Definition at line 231 of file wave8.hpp.

                                                           {
    for (int symbol_idx = 0; symbol_idx < 8; symbol_idx++) {
        u8 l[16];
        for (int lane = 0; lane < 16; lane++) {
            l[lane] = lane_waves[lane].symbols[symbol_idx].data;
        }
        spread_transpose16_symbol(l, output + symbol_idx * 16);
    }
}

References spread_transpose16_symbol().

Referenced by fl::wave8Transpose_16(), and fl::wave8Transpose_16().

Here is the call graph for this function:

Here is the caller graph for this function: