|
FastLED 3.9.15
|
Shared u32 "spread LUT" bit-matrix transpose primitive (no SIMD, no u64).
Benchmarked (#2533) on the ESP32-P4 (RV32, in-order): ~2× faster than the unrolled-naive transpose (16-lane 6649→3353 µs = 1.98×; 8-lane 3356→1764 µs = 1.90×; bit-exact). For each output it computes acc = OR over 8 lanes of spread(laneByte) << lane where spread() pre-positions a byte's pulse bits into 8 separate bytes. All ops are native u32 and independent (no dependency chain, no emulated u64) — exactly what the in-order core schedules best — and the tiny table is cache-resident. Used by both wave8 (8 symbols) and wave3 (3 symbols); the per-symbol op is an 8-bit transpose, identical for both.
Definition in file bit_spread_lut.hpp.
Include dependency graph for bit_spread_lut.hpp:
This graph shows which files directly or indirectly include this file:Go to the source code of this file.
Namespaces | |
| namespace | fl |
| Base definition for an LED controller. | |
| namespace | fl::detail |
Compile-time linker keep-alive hook for a single fl::Bus. | |
Functions | |
| FASTLED_FORCE_INLINE FL_IRAM FL_OPTIMIZE_FUNCTION void | fl::detail::spread_transpose16_symbol (const u8 l[16], u8 out[16]) |
| Transpose one symbol of 16 lanes (16 input bytes) into 16 output bytes: 8 pulses × 2 bytes, low byte = lanes 0-7, high byte = lanes 8-15, pulse order 7..0 (out[0] = pulse 7 low). | |
| FASTLED_FORCE_INLINE FL_IRAM FL_OPTIMIZE_FUNCTION void | fl::detail::spread_transpose8_symbol (const u8 l[8], u8 out[8]) |
| Transpose one symbol of 8 lanes (8 input bytes) into 8 output bytes: 8 pulses × 1 byte (bit L = lane L), pulse order 7..0 (out[0] = pulse 7). | |
| FASTLED_FORCE_INLINE u32 | fl::detail::spreadA (u8 v) |
| Pulses 7,6,5,4 of v (byte j = bit (7-j)). Depends only on the high nibble. | |
| FASTLED_FORCE_INLINE u32 | fl::detail::spreadB (u8 v) |
| Pulses 3,2,1,0 of v (byte j = bit (3-j)). Depends only on the low nibble. | |
Variables | |
| constexpr u32 | fl::detail::kSpreadNibble [16] |
| kSpreadNibble[n] places the 4 bits of nibble n at bit 0 of 4 separate bytes: byte0 = bit3(n), byte1 = bit2(n), byte2 = bit1(n), byte3 = bit0(n). | |