FastLED 3.9.15
Loading...
Searching...
No Matches
bit_spread_lut.hpp File Reference

Detailed Description

Shared u32 "spread LUT" bit-matrix transpose primitive (no SIMD, no u64).

Benchmarked (#2533) on the ESP32-P4 (RV32, in-order): ~2× faster than the unrolled-naive transpose (16-lane 6649→3353 µs = 1.98×; 8-lane 3356→1764 µs = 1.90×; bit-exact). For each output it computes acc = OR over 8 lanes of spread(laneByte) << lane where spread() pre-positions a byte's pulse bits into 8 separate bytes. All ops are native u32 and independent (no dependency chain, no emulated u64) — exactly what the in-order core schedules best — and the tiny table is cache-resident. Used by both wave8 (8 symbols) and wave3 (3 symbols); the per-symbol op is an 8-bit transpose, identical for both.

Definition in file bit_spread_lut.hpp.

+ Include dependency graph for bit_spread_lut.hpp:
+ This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Namespaces

namespace  fl
 Base definition for an LED controller.
 
namespace  fl::detail
 Compile-time linker keep-alive hook for a single fl::Bus.
 

Functions

FASTLED_FORCE_INLINE FL_IRAM FL_OPTIMIZE_FUNCTION void fl::detail::spread_transpose16_symbol (const u8 l[16], u8 out[16])
 Transpose one symbol of 16 lanes (16 input bytes) into 16 output bytes: 8 pulses × 2 bytes, low byte = lanes 0-7, high byte = lanes 8-15, pulse order 7..0 (out[0] = pulse 7 low).
 
FASTLED_FORCE_INLINE FL_IRAM FL_OPTIMIZE_FUNCTION void fl::detail::spread_transpose8_symbol (const u8 l[8], u8 out[8])
 Transpose one symbol of 8 lanes (8 input bytes) into 8 output bytes: 8 pulses × 1 byte (bit L = lane L), pulse order 7..0 (out[0] = pulse 7).
 
FASTLED_FORCE_INLINE u32 fl::detail::spreadA (u8 v)
 Pulses 7,6,5,4 of v (byte j = bit (7-j)). Depends only on the high nibble.
 
FASTLED_FORCE_INLINE u32 fl::detail::spreadB (u8 v)
 Pulses 3,2,1,0 of v (byte j = bit (3-j)). Depends only on the low nibble.
 

Variables

constexpr u32 fl::detail::kSpreadNibble [16]
 kSpreadNibble[n] places the 4 bits of nibble n at bit 0 of 4 separate bytes: byte0 = bit3(n), byte1 = bit2(n), byte2 = bit1(n), byte3 = bit0(n).